Performance Modeling Lead

OpenAI•San Francisco, CA

1d•Hybrid

About The Position

We are seeking a Performance Modeling Lead to build and lead a small, high-impact team responsible for answering forward-looking architectural questions across AI infrastructure systems. You will develop modeling frameworks and methodologies to evaluate system-level tradeoffs and guide key design decisions. Your work will directly influence reference architectures, vendor designs, and long-term infrastructure strategy. This role sits at the intersection of AI workloads, system architecture, and quantitative modeling, and requires strong technical judgment, ownership, and the ability to translate complex analysis into clear, actionable guidance.

Requirements

Have experience owning or building performance modeling frameworks used to drive real system design decisions.
Have deep knowledge of AI/ML workloads, including training and/or inference at scale.
Understand system-level tradeoffs across compute, memory, and networking in large-scale distributed systems.
Are comfortable working across abstraction layers—from workload behavior to hardware implementation.
Have experience using modeling (analytical or simulation) to inform architectural decisions.
Can operate in ambiguous problem spaces and turn open-ended questions into structured analysis.
Communicate clearly and influence both internal teams and external partners.

Nice To Haves

Experience working with hardware vendors (ODM/JDM, silicon, networking).
Background in data center infrastructure or hyperscale systems.
Familiarity with accelerators (GPUs/ASICs) and interconnects (e.g., NVLink, InfiniBand, Ethernet).
Experience influencing hardware roadmaps or reference architectures.
Prior experience leading or mentoring engineers.

Responsibilities

Build and own a performance modeling framework/toolchain to evaluate AI systems across multiple levels of abstraction.
Analyze and quantify architectural tradeoffs across compute, memory, networking, storage, and system topology.
Develop performance models to guide decisions on: scale-up vs. scale-out architectures, interconnect and network design, memory hierarchy and system balance.
Translate modeling outputs into clear recommendations for internal teams and external hardware vendors.
Influence reference designs and vendor roadmaps through data-driven insights.
Partner closely with machine learning, systems, and hardware teams to understand workload characteristics and requirements.
Lead and grow a small team (2–3 engineers), setting technical direction and maintaining high standards for modeling rigor.
Continuously improve modeling fidelity by validating against real system behavior and measurements.