The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings. Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models. As a Software Engineer on the Frontier Systems team focused on power management, you will work on critical infrastructure to support cutting-edge research. With large-scale supercomputers consuming substantial amounts of power, managing this efficiently is key to maximizing computational capacity. This role is critical to ensuring that our cutting-edge research supercomputing infrastructure runs smoothly, while maintaining reliability and grid-level power stability. Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees