Staff Site Reliability Engineer (SRE), Agile

OktaSan Francisco, CA
Hybrid

About The Position

We are seeking an experienced Staff Site Reliability Engineer to join our Infrastructure Platform AGILE SRE team. This role focuses on providing cross-functional support, enabling teams to build critical infrastructure at the same time strengthening our internal tooling and operational capabilities. You will work closely with other Infrastructure Operations teams to diagnose, troubleshoot, and resolve complex infrastructure challenges by building tooling and designing clever solutions.

Requirements

  • 7+ years of Site Reliability Engineering or equivalent systems administration experience
  • Proficiency with Kubernetes and container orchestration
  • Strong Linux/Unix systems administration background
  • Good understanding of CI/CD and deployment strategies
  • Good grasp of networking concepts
  • Experience with infrastructure as code, infrastructure troubleshooting and general architecture
  • Excellent communication and documentation skills

Nice To Haves

  • Kubernetes, Terraform, Golang, Python
  • Experience working across multiple teams in a cross-functional capacity
  • Familiarity with compliance and change management processes

Responsibilities

  • Investigate and resolve infrastructure issues reported by internal teams
  • Provide technical guidance and support across multiple technical domains
  • Contribute to runbooks, documentation, and knowledge sharing
  • Mentor junior team members on SRE best practices and troubleshooting methodologies
  • Identify and implement improvements to monitoring, alerting, and incident response processes

Benefits

  • health, dental and vision insurance
  • 401(k)
  • flexible spending account
  • paid leave (including PTO and parental leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service