Palo Alto Networks-posted 1 day ago
$140,000 - $230,000/Yr
Full-time • Mid Level
Hybrid • Santa Clara, CA
5,001-10,000 employees

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability. Central Infrastructure & Platform Engineering Team | Santa Clara, CA (Hybrid/Onsite as applicable) We’re hiring a Sr Staff Platform SRE for our SASE central cloud platform team. We’re looking for a well-rounded platform SRE who can architect, build, and operate cloud-native infrastructure at very large scale across GCP, AWS, and OCI. This is a unique opportunity to operate at a humongous scale—the platforms you’ll influence are tied to hundreds of millions of dollars of annual cloud spend, and the work you do will directly impact reliability, efficiency, developer velocity, and operational excellence across the organization

  • Act as an architect for infrastructure owned by the team—plan ahead and design in line with scale requirements.
  • Design, develop, and execute infrastructure components for the platforms owned by the team.
  • Own Infrastructure as Code(IaC), Monitoring as Code(MaC), Policy as Code(PaC) components and build the golden path for future platforms with best practices
  • Strive for autonomy with an automation-first mindset, including modern AI-driven approaches.
  • Redefine and continuously update modern CI/CD practices for cloud-native workloads
  • Perform on-call duties and reduce on-call toil through automation, AI agents, analyzers, and self-healing patterns
  • Support internal platform users as a forward-deployed engineer, close the feedback loop, and modernize the platform based on user needs
  • Maintain a security-first mindset without compromising reliability and operability
  • Design cost-effective infrastructure solutions across AWS, GCP, and OCI, including cost governance, capacity planning, and efficiency improvements
  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Expert knowledge of Kubernetes and CNCF ecosystem tools such as Helm, Prometheus, Backstage, Istio, and Crossplane.
  • Strong mastery of Terraform: building reusable modules,designing complex infrastructure offerings operating in protected / restricted environments
  • Strong foundational knowledge of operating and scaling cloud-native workloads using KEDA, Karpenter, NAP, etc.
  • Ability to architect CI/CD infrastructure for cloud-native workloads—primarily Golang and Python—and build DevSecOps pipelines.
  • Programming skills with GoLang & Python, scripting experience with bash
  • Strong knowledge of Argo CD, including controlling and scaling thousands of deployments across Kubernetes and multiple clouds.
  • Deep experience in cost governance and optimization at scale, including allocation models, anomaly detection, efficiency recommendations, and guardrails across cloud and Kubernetes workloads.
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
  • Strong communication skills and the ability to partner across platform, security, and application engineering teams
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service