Principal Platform Site Reliability Engineer (SASE Central Cloud Platforms)

Palo Alto Networks-posted 1 day ago

$140,000 - $230,000/Yr

Full-time • Mid Level

Hybrid • Santa Clara, CA

5,001-10,000 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability. Central Infrastructure & Platform Engineering Team | Santa Clara, CA (Hybrid/Onsite as applicable) We’re hiring a Sr Staff Platform SRE for our SASE central cloud platform team. We’re looking for a well-rounded platform SRE who can architect, build, and operate cloud-native infrastructure at very large scale across GCP, AWS, and OCI. This is a unique opportunity to operate at a humongous scale—the platforms you’ll influence are tied to hundreds of millions of dollars of annual cloud spend, and the work you do will directly impact reliability, efficiency, developer velocity, and operational excellence across the organization

Act as an architect for infrastructure owned by the team—plan ahead and design in line with scale requirements.
Design, develop, and execute infrastructure components for the platforms owned by the team.
Own Infrastructure as Code(IaC), Monitoring as Code(MaC), Policy as Code(PaC) components and build the golden path for future platforms with best practices
Strive for autonomy with an automation-first mindset, including modern AI-driven approaches.
Redefine and continuously update modern CI/CD practices for cloud-native workloads
Perform on-call duties and reduce on-call toil through automation, AI agents, analyzers, and self-healing patterns
Support internal platform users as a forward-deployed engineer, close the feedback loop, and modernize the platform based on user needs
Maintain a security-first mindset without compromising reliability and operability
Design cost-effective infrastructure solutions across AWS, GCP, and OCI, including cost governance, capacity planning, and efficiency improvements

BS or MS in Computer Science, a related field, or equivalent professional experience
Expert knowledge of Kubernetes and CNCF ecosystem tools such as Helm, Prometheus, Backstage, Istio, and Crossplane.
Strong mastery of Terraform: building reusable modules,designing complex infrastructure offerings operating in protected / restricted environments
Strong foundational knowledge of operating and scaling cloud-native workloads using KEDA, Karpenter, NAP, etc.
Ability to architect CI/CD infrastructure for cloud-native workloads—primarily Golang and Python—and build DevSecOps pipelines.
Programming skills with GoLang & Python, scripting experience with bash
Strong knowledge of Argo CD, including controlling and scaling thousands of deployments across Kubernetes and multiple clouds.
Deep experience in cost governance and optimization at scale, including allocation models, anomaly detection, efficiency recommendations, and guardrails across cloud and Kubernetes workloads.
Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
Strong communication skills and the ability to partner across platform, security, and application engineering teams

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Site Reliability Engineer Resume Examples

•

Site Reliability Engineer Cover Letter Examples

Principal Platform Site Reliability Engineer (SASE Central Cloud Platforms)

Job Search Resources

Tools

Career Hubs

Guides

Company