Technical Program Manager, Capacity Tooling

OpenAI•San Francisco, CA

136d

About The Position

OpenAI’s Capacity Planning team ensures that our research and product teams have the compute, storage, and networking resources they need—when they need them. We work across engineering, product, and research to forecast demand, track supply, and optimize utilization of compute. Our goal is to develop data-driven, automated, and scalable planning systems that unlock the next generation of frontier AI models. We are looking for a Capacity Tooling Engineer to design, build, and maintain the internal platforms, services, and dashboards that power OpenAI’s capacity planning and allocation processes. You will create the tooling that helps us forecast usage, model scenarios, and make multi-billion-dollar infrastructure decisions. Your work will directly impact how we allocate compute across research, product launches, and strategic initiatives. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

Requirements

Depth and expertise in one or more of the following areas: GPU, CPU, Storage, Networking.
Experience in AI/ML and/or cloud infrastructure.
Ability to make complex decisions with significant engineering, commercial, product and research implications, often with many billions of dollars involved.
Ability to thrive in ambiguity and work on a lean team as a self-starter.

Nice To Haves

Excited about building infrastructure at an incredible scale.
Ability to move fast, make decisions, and be held accountable.
Ability to wear multiple hats and juggle technical, business and engineering considerations.

Responsibilities

Build and scale tooling for capacity planning that incorporate data pipelines, forecasting dashboards, allocation solvers, and scenario modeling tools.
Integrate data sources from infrastructure teams, data science, and multiple cloud providers to create a single source of truth for compute supply, demand, and costs.
Develop real-time reporting and alerting to surface supply gaps, utilization trends, and risks to leadership.
Design and implement automations to streamline workflows such as demand collection and supply allocation.
Design and implement optimization engines and solvers that recommend optimal allocation of compute.
Build interactive models that allow leadership to test 'what-if' scenarios (e.g., varying levels of user growth, price changes, new product launches, etc).

Benefits

Relocation assistance to new employees.
Hybrid work model of 3 days in the office per week.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Number of Employees

1,001-5,000 employees

Technical Program Manager, Capacity Tooling

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company