Engineering Manager, Core Services

OpenAI•San Francisco, CA

54d•Onsite

About The Position

The Core Services organization builds and runs the mission-critical online services that product teams rely on in production. We own foundational distributed systems and platform capabilities that enable reliable execution, high-performance services, and large-scale file/data needs across our products. This team is distinct from developer infrastructure and data infrastructure—our focus is production service foundations and core runtime services. About the Role We’re hiring an Engineering Manager, Core Services to help lead teams responsible for highly reliable, high-scale distributed systems that sit on the critical path for OpenAI products. Your team will own foundational production systems that OpenAI’s product engineering teams build on. You’ll collaborate closely with product and infrastructure partners to ship reliable services quickly, and help scale systems and teams as OpenAI grows. You’ll partner closely with senior engineering leaders to scale the org, mature operations, and drive major platform initiatives. This role requires strong technical ability.

Requirements

Have significant experience leading teams that run mission-critical infrastructure in production.
Have experience operating mission-critical services, or core distributed systems building blocks.
Have built platform-like systems (e.g., orchestration/workflow execution, service platforms) and/or large-scale storage/blob/file infrastructure.
Have experience building systems spanning multiple cloud environments.
Take pride in building scalable, reliable systems and improving operational health.
Own problems end-to-end and can move fast in environments with ambiguity and competing priorities.

Responsibilities

Managing and growing a high-performing, team of infrastructure engineers.
Leading teams building and operating large, critical production platforms, including cluster reliability, scaling, and rollout safety.
Building and operating mission-critical distributed systems with strong operational rigor (SLOs, incident response, capacity planning, reliability).
Setting technical direction for platform foundations such as workflow/orchestration capabilities, large-scale file/blob/storage services, and core service foundations.
Partnering with a broad set of stakeholders, including product engineering, adjacent infrastructure teams, and (where relevant) finance/cost partners.
Coaching, mentoring, and developing engineers and emerging leaders.