Sr Staff Site Reliability Engineer (Cortex Data Lake)

Palo Alto Networks•Santa Clara, CA

393d•$126,000 - $203,500

About The Position

The Senior Staff Site Reliability Engineer (SRE) at Palo Alto Networks will play a crucial role in supporting the services running on a large infrastructure, particularly focusing on automation, architecture, performance, observability, troubleshooting, security, and reliability. This position is integral to ensuring that applications are production-ready, scalable, and reliable, while also contributing to the success of SRE and DevOps initiatives.

Requirements

4+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering
3+ years building high availability, scalable cloud-native applications on AWS or GCP
BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience required
Expertise in configuration management with a framework such as Ansible, Terraform, Helm
Passion for infrastructure and monitoring as code
Solid experience in container workloads and Kubernetes
Familiarity with PKI concepts, Networking concepts
In-depth knowledge of different security controls (app-id, user-id, security profile, url category, content, ssl decryption, firewall MFA etc)
Linux administration, internals, and network troubleshooting
Proficiency with programming languages like Golang or Python along with shell scripting to automate tasks
Proficiency with CI/CD pipelines, ArgoCD and GitLab CI/CD. Knowledge of GitLab Runners is a plus
Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
Experience with managing Kafka is a plus
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
Ready to understand and dissect new technology stacks quickly

Nice To Haves

Experience with managing Kafka is a plus
Knowledge of GitLab Runners is a plus

Responsibilities

Contribute to the success of SRE and DevOps
Develop expertise in new technologies
Work with developers, researchers, data scientists, and security experts
Design, build and operate reliable, secure Cloud infrastructure
Ensure that applications are production-ready, scalable, and reliable
Develop tools and automation frameworks
Automate robust deployment of robust services
Orchestrate end-to-end monitoring and alerting
Participate with SRE and Dev teams in the on-call rotation
Lead root cause analysis of critical business and production issues

Benefits

FLEXBenefits wellbeing spending account
Mental and financial health resources
Personalized learning opportunities
Restricted stock units
Bonus opportunities

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Professional, Scientific, and Technical Services

Education Level

Bachelor's degree

Sr Staff Site Reliability Engineer (Cortex Data Lake)

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company