Capacity Systems Software Engineer

OpenAISan Francisco, CA
$125,000 - $228,000Hybrid

About The Position

OpenAI's Industrial Compute organization is responsible for planning, delivering, operating, and optimizing the compute infrastructure that powers frontier AI. As OpenAI scales toward becoming an intelligence utility, Industrial Compute coordinates a complex lifecycle spanning infrastructure strategy, capacity planning, provider partnerships, fleet operations, product demand, and financial planning. The organization manages one of the largest and fastest-growing compute footprints in the world, where decisions around capacity allocation, deployment readiness, utilization, reliability, and product demand directly impact product availability, customer experience, and business performance. The Capacity Systems team builds the software platforms, data systems, and automation frameworks that connect these functions into a shared operating model. We transform fragmented planning workflows into scalable systems that enable teams to understand what compute was contracted, delivered, healthy, allocated, and ultimately converted into business and research outcomes. We are seeking a Capacity Systems Software Engineer to build the platforms and services that power Industrial Compute planning, forecasting, optimization, and operational decision-making. In this role, you will design and develop software systems that connect infrastructure delivery, fleet health, capacity allocation, demand forecasting, deployment readiness, financial planning, and product consumption into a unified system of record. Your work will help OpenAI make better decisions about where compute should be deployed, how capacity should be allocated, and how infrastructure investments translate into business value. You will partner closely with Capacity Planning, Fleet Operations, Infrastructure Engineering, Product, Finance, Supply Chain, and Strategic Sourcing teams to replace spreadsheet-driven workflows with scalable software systems that enable visibility, automation, and decision support across OpenAI's global compute footprint. This role is ideal for engineers who enjoy building internal platforms, operational systems, workflow automation, and data-intensive applications that sit at the intersection of software, infrastructure, and business operations. This role is based in San Francisco and follows OpenAI's hybrid work model of 3 days per week in the office.

Requirements

  • 5+ years of experience in software engineering, platform engineering, infrastructure engineering, or related technical disciplines.
  • Strong programming experience in Python, Go, Java, TypeScript, or similar languages.
  • Experience building distributed systems, backend services, internal platforms, workflow systems, or operational tooling.
  • Experience designing APIs, data pipelines, and integrations across multiple systems.
  • Strong system design and software architecture skills.
  • Experience working with large operational datasets and business-critical workflows.
  • Ability to operate effectively in highly cross-functional environments and translate ambiguous operational challenges into scalable technical solutions.
  • Strong ownership mindset and ability to independently drive complex projects.

Nice To Haves

  • Experience building planning systems, forecasting platforms, optimization engines, or decision-support tools.
  • Experience with SQL, data warehouses, orchestration frameworks, analytics platforms, and distributed data systems.
  • Experience supporting infrastructure, cloud platforms, data centers, hardware deployment programs, or large-scale operational environments.
  • Familiarity with capacity planning, supply chain systems, financial modeling, or infrastructure operations.
  • Experience replacing spreadsheet-driven workflows with scalable software platforms.
  • Experience building systems that support scenario planning, forecasting, optimization, or resource allocation.
  • Familiarity with AI infrastructure, hyperscale compute environments, or large-scale distributed systems.

Responsibilities

  • Design and build software systems that serve as the system of record for Industrial Compute planning and operations.
  • Develop backend services, APIs, workflows, and data platforms that support capacity forecasting, allocation, deployment readiness, and operational planning.
  • Build applications that connect infrastructure delivery, fleet health, capacity utilization, product demand, and financial planning into a shared operational view.
  • Build planning and scenario-modeling systems that help leaders understand tradeoffs across capacity, utilization, cost, reliability, launch timing, and business impact.
  • Create workflow automation and decision-support tooling that improves planning accuracy and reduces operational overhead.
  • Partner with Capacity Planning, Fleet Operations, Product, Finance, Infrastructure Engineering, Supply Chain, and Strategic Sourcing teams to understand operational workflows and translate them into software systems.
  • Drive architecture decisions across planning platforms, operational tooling, and internal infrastructure systems.
  • Improve data quality, observability, and operational visibility across Industrial Compute programs.
  • Build extensible software foundations that scale alongside OpenAI's rapidly growing infrastructure footprint.

Benefits

  • Hybrid work model
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service