ChatGPT Performance Engineer

OpenAI•San Francisco, CA

About The Position

OpenAI is looking for an experienced Performance Engineer to help us scale the performance, reliability, and efficiency of our systems. In this role, you'll apply deep technical expertise to optimize infrastructure and application-level performance across mission-critical products like ChatGPT and our developer API. You’ll work cross-functionally with teams building core services, training models, and developing real-time user experiences to push our latency, throughput, and cost-efficiency to the next level. We are looking for engineers who thrive in ambiguous environments, value deep systems understanding, and are motivated by delivering measurable impact. This is a highly technical, individual contributor role focused on root-cause analysis, profiling, instrumentation, and architecture-level performance improvements across our stack.

Requirements

Have 7+ years of experience in software engineering with a strong track record in performance or reliability of high-scale distributed systems.
Are deeply comfortable with performance profiling tools and tracing systems.
Have experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Python/Golang internals, GPU utilization).
Have a strong understanding of OS internals, scheduling, memory management, and IO patterns.
Have contributed to observability, benchmarking, or performance-focused infrastructure at scale.
Have demonstrated success navigating ambiguity and aligning stakeholders around performance goals.
Value simplicity, rigor, and collaboration when solving complex systems problems.

Responsibilities

Analyze and optimize performance across application, middleware, runtime, and infrastructure layers—networking, storage, Python runtime, GPU utilization, and beyond.
Develop tooling and metrics that provide deep observability into system performance.
Collaborate closely with infra, platform, training, and product teams to identify key performance goals and drive systemic improvements.
Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
Lead investigations into high-impact performance regressions or scalability issues in production.
Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume