We are looking for engineers to help build and operate the next generation of compute infrastructure powering OpenAI’s frontier research. This is an opportunity to work on the large-scale clusters, high-performance networks, and supercomputing systems that enable some of the most advanced AI workloads in the world. In this role, you’ll combine distributed systems engineering with hands-on infrastructure work across some of our largest data centers. You’ll help scale Kubernetes clusters to massive scale, automate bare-metal bring-up, and build the software layers that make heterogeneous GPU fleets and multi-datacenter supercomputing environments easier to operate. You’ll work where hardware and software meet, in an environment where speed, efficiency, and reliability are critical. That means solving real-time operational challenges, quickly diagnosing and fixing issues when they arise, and continuously improving automation, resilience, performance, and uptime across the systems that power frontier model training.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed
Number of Employees
1-10 employees