Director, Site Reliability Engineering - Infrastructure Platform

OktaSan Francisco, CA
71d$266,000 - $398,000

About The Position

Okta is The World’s Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth. At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences. Join our team! We’re building a world where Identity belongs to you. The Infrastructure Platform and Shared Services Team Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput, and 99.999 availability. We're looking for a technical leader to help us to continue to scale the service with great people and reliable, cost-effective and efficient infrastructure, processes and tooling. As the Director of Infrastructure Platform and Shared Services you will oversee multiple teams focused on Edge networking, K8s platform, CI/CD, Observability, automation platform & tooling.

Requirements

  • 8+ years of experience in technical leadership & people management.
  • Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale.
  • 3+ years of experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS. Experience supporting a multi-Cloud environment will be a plus.
  • Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines.
  • Strong background and hands-on experience in SW development, PaaS and automation.
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
  • Demonstrated ability to lead cross-functional teams and manage large-scale programs.
  • Effective verbal, written communication and interpersonal skills.
  • Computer Science Degree or related degree or equivalent experience.

Responsibilities

  • Lead the Infra platform and shared services org and various initiatives across SRE & Infrastructure organization.
  • Lead the DevOps transformation, microservice journey, and next generation Infra platform capabilities in partnership with architects and product engineering.
  • Build a world-class observability platform and monitoring capabilities enabled with self-service.
  • Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful tooling, and intuitive self-service capabilities.
  • Own the design and operation of scalable, self-service Cloud infrastructure platforms (e.g., Kubernetes, service mesh, CI/CD pipelines, IaC & Edge Infrastructure).
  • Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains.
  • Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.
  • Improve SDLC processes for Cloud infrastructure as a code, including the maturity of CI/CD pipelines, change and release management.
  • Manage service and business expectations and prioritize resource allocation.
  • Maintain a deep knowledge of industry best practices, evolving trends, and technologies.

Benefits

  • Health insurance
  • Dental insurance
  • Vision insurance
  • 401(k)
  • Flexible spending account
  • Paid leave (including PTO and parental leave)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service