Distributed Systems/ML Engineer
OpenAI
·
Posted:
August 18, 2023
·
Hybrid
About the position
As a Distributed Systems/ML engineer on the Platform ML team at OpenAI, your role will focus on improving the training throughput of our internal training framework while enabling researchers to experiment with new ideas. You will be responsible for optimizing performance, understanding distributed systems, and ensuring bug-free machine learning code. The ultimate goal of this role is to push the field forward and accelerate progress towards AGI. This position is based in San Francisco, CA and offers a hybrid work model with relocation assistance available.
Responsibilities
- Apply the latest techniques in the internal training framework to achieve hardware efficiency for training runs
- Profile and optimize the training framework
- Work with researchers to enable them to develop the next generation of models
- Run small scale ML experiments
- Figure out how systems work and continuously come up with ideas to make them faster while minimizing complexity and maintenance burden
- Have strong software engineering skills and proficiency in Python
Requirements
- Good engineering skills, including designing, implementing, and optimizing state-of-the-art AI models
- Proficiency in writing bug-free machine learning code
- Deep knowledge of the performance of supercomputers
- Experience in optimizing performance and understanding distributed systems
- Ability to run small scale ML experiments
- Strong software engineering skills, particularly in Python
Benefits
- Medical, dental, and vision insurance for you and your family
- Mental health and wellness support
- 401(k) plan with 4% matching
- Unlimited time off and 18+ company holidays per year
- Paid parental leave (20 weeks) and family-planning support
- Annual learning & development stipend ($1,500 per year)
- Generous equity compensation
- Annual salary range of $245,000 - $385,000 USD