Manager, Site Reliability Engineering

Okta•San Francisco, CA

2d•Hybrid

About The Position

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. This position requires 2 days a week in our San Francisco Office. The IDaaS Site Reliability Engineering Group Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput and 99.999 availability. We're looking for a technical leader to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and tooling. As the Manager of Infrastructure Platform and Shared Services, you will oversee multiple teams focused on Edge networking, K8s platform, CI/CD, Observability, automation platform & tooling.

Requirements

3+ years of experience in technical leadership & people management
Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale
Experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS.
Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
Strong background and hands-on experience in SW development, PaaS and automation
Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
Effective verbal, written communication and interpersonal skills
Computer Science Degree or related degree or equivalent experience
This position requires the ability to access federal environments and/or have access to protected federal data.
As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.

Nice To Haves

Experience supporting a multi-Cloud environment will be a plus.

Responsibilities

Managing a team of SRE’s supporting various workloads and teams that support our IDaaS platform.
Drive the microservice journey, DevOps maturity, and workload reliability in tandem with architects and teams across the organization.
Accelerate the velocity of SRE and product engineering by developing powerful tooling, intuitive self-service capabilities, and robust self-healing patterns.
Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains.
Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.
Improve SDLC processes for Cloud infrastructure as a code, including the maturity of CI/CD pipelines, change and release management
Manage service and business expectations and prioritize resource allocation
Maintain a deep knowledge of industry best practices, evolving trends, and technologies