Palo Alto Networks-posted about 2 months ago
$173,000 - $230,000/Yr
Full-time • Principal
Onsite • Santa Clara, CA
5,001-10,000 employees

The Prisma Cloud team delivers the industry’s most advanced Cloud SecOps platforms. This is an opportunity to develop your DevOps skills at scale, learn new technologies and operational processes, and advance your career in a large scale enterprise SaaS production environment. As a member of the Prisma Cloud DevOps team, your role will involve operating and maintaining large-scale AWS and GCP Production environments including databases, Infrastructure as Code and observability systems. To meet the opportunities that such a role provides, you will have knowledge of scaling and maintaining SQL and NoSQL databases, Kafka, Terraform, and supporting all aspects of Kubernetes as well as observability and monitoring tools and practices.

  • Gain Cloud Experience - Utilize your prior experience in maintaining cloud platforms to optimize our infrastructure
  • Manage Incidents - Participate in incident management, following established processes to ensure prompt resolution of system issues, minimizing impact on services
  • Implement CI/CD - Develop and maintain application deployment configuration, using tools such as Terraform and Helm
  • Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
  • Participate in On-Call - Participate in an on-call rotation with our DevOps teams to provide follow-the-sun operational coverage in the production of our SaaS product
  • Collaborate - Work with our Engineering teams to influence the operability of the product and ensure the reliability and availability of our services
  • DevOps/SRE Experience - 8+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level
  • Data Platform Experience - Design, deployment and maintenance of SQL DBs (e.g. PostgreSQL, MySQL) and NoSQL databases (e.g. Redis, Cassandra, Hadoop). Experience with Kafka is preferred.
  • Linux experience - A strong understanding of Linux internals allowing for quick troubleshooting
  • Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering
  • Troubleshooting - Ability to effectively troubleshoot and address emerging and complex problems
  • Cloud Proficiency - Proficiency in Google Cloud Platform, Amazon Web Services or similar public cloud platform
  • Kubernetes and Docker - Experience with Docker and Kubernetes for container orchestration
  • Scripting and Automation - Proficiency in Python programming and Linux Shell commands
  • Terraform - Experience with Terraform for infrastructure as code
  • Security - Familiarity with security concepts and best practices
  • Observability - Experience with observability and incident response tools
  • Communication Skills - Effective communication and interpersonal skills, with the ability to work with and coordinate between multiple teams
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service