Principal Site Reliability Engineer (AIOps)

Palo Alto Networks
$151,600 - $245,300Onsite

About The Position

At Palo Alto Networks®, the mission is to protect our digital way of life by thriving at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. The company values Disruption, Collaboration, Execution, Integrity, and Inclusion, and integrates AI into its operations. Collaboration is emphasized, with most teams working from the office full time, offering flexibility when needed to support real-time problem-solving, stronger relationships, and precision. This role is for a Site Reliability Engineer at Palo Alto Networks, which operates a large hybrid infrastructure and is a significant GCP customer. The position involves supporting services on this infrastructure, focusing on automation, architecture, performance, metrics, troubleshooting, security, and reliability. The technology stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Tensorflow, Datadog, Elasticsearch, Kafka, Hadoop, MySQL, Percona, MongoDB, Python, and Go, with an expectation to learn necessary technologies.

Requirements

  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Experience in Production Engineering, DevOps, or Site Reliability
  • Expertise in private or public cloud
  • Strong Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
  • Passion for infrastructure and monitoring as code
  • Ready to understand and dissect new technology stacks quickly

Nice To Haves

  • Familiarity with CI/CD pipelines, GitLab and GitHub preferred

Responsibilities

  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate with SRE and Dev teams in the on-call rotation
  • Lead root cause analysis of critical business and production issues
  • Mentor and champion SRE culture
  • Participate in design reviews
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service