This role focuses on keeping our frontier RL training runs fast, reliable, and unblocked. You will work across engineering and infrastructure problems as they emerge, from scaling and orchestration issues to inference bottlenecks, numerical problems, and hardware failures, as well as supporting large horizontal integrations in the big run, like multi-agent capabilities or memory. This is a role for a strong generalist who quickly learns anything needed for the task, has high attention to detail, debugs deeply, and is motivated by fixing the highest-impact problem in front of the team.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed