Manager, Site Reliability Engineering

OktaSan Francisco, CA
$204,000 - $281,000Hybrid

About The Position

Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. This position requires 2 days a week in our San Francisco Office. The IDaaS Site Reliability Engineering Group Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput and 99.999 availability. We're looking for a technical leader to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and tooling. As the Manager of Infrastructure Platform and Shared Services, you will oversee multiple teams focused on Edge networking, K8s platform, CI/CD, Observability, automation platform & tooling.

Requirements

  • 3+ years of experience in technical leadership & people management
  • Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale
  • Experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS. Experience supporting a multi-Cloud environment will be a plus.
  • Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
  • Strong background and hands-on experience in SW development, PaaS and automation
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
  • Effective verbal, written communication and interpersonal skills
  • Computer Science Degree or related degree or equivalent experience
  • This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.

Nice To Haves

  • Experience supporting a multi-Cloud environment will be a plus.

Responsibilities

  • Managing a team of SRE’s supporting various workloads and teams that support our IDaaS platform.
  • Drive the microservice journey, DevOps maturity, and workload reliability in tandem with architects and teams across the organization.
  • Accelerate the velocity of SRE and product engineering by developing powerful tooling, intuitive self-service capabilities, and robust self-healing patterns.
  • Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains.
  • Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.
  • Improve SDLC processes for Cloud infrastructure as a code, including the maturity of CI/CD pipelines, change and release management
  • Manage service and business expectations and prioritize resource allocation
  • Maintain a deep knowledge of industry best practices, evolving trends, and technologies

Benefits

  • health, dental and vision insurance
  • 401(k)
  • flexible spending account
  • paid leave (including PTO and parental leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service