Principal Site Reliability Engineer (CIPE)

Palo Alto Networks

79d•Onsite

About The Position

Our Mission At Palo Alto Networks®, we’re united by a shared mission—to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice, and every idea counts. If you’re ready to do the most meaningful work of your career alongside people who are just as passionate as you are, you’re in the right place. Who We Are In order to be the cybersecurity partner of choice, we must trailblaze the path and shape the future of our industry. This is something our employees work at each day and is defined by our values: Disruption, Collaboration, Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and use it to augment the impact every individual can have. If you are passionate about solving real-world problems and ideating beside the best and the brightest, we invite you to join us! We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes. Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers; you are architecting the reliability, scalability, and security of a massive Kubernetes ecosystem. We are looking for a visionary who balances deep systems expertise with a modern, AI-augmented development workflow. You will lead the evolution of our GKE (Google Kubernetes Engine) environment, championing GitOps best practices and integrating advanced security protocols directly into our delivery pipelines. Your Impact

Requirements

Kubernetes Mastery: Expert-level experience managing production K8s workloads (preferably within GKE, but will also consider EKS). Deep understanding of Networking, Storage, and RBAC.
CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins).
Programming: Proficient in Python for systems programming and automation.
Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment.
Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity.

Responsibilities

Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization.
GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model.
Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance.
Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability.
AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks.
Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume