Want to build simulated RL environments that push frontier models to their limits? This role is about advancing the science of post-training, reinforcement learning, and scalable evaluation. Instead of static benchmarks, you’ll create dynamic simulations that probe reasoning, planning, and long-horizon behaviour — work that defines how the next generation of AI will be trained and supervised. You’ll design new post-training algorithms (RLHF, DPO, GRPO and beyond), develop reward models that move beyond exact-match signals, and publish your findings while seeing them deployed in production systems. The work spans both core research and practical implementation, giving you the chance to shape frameworks already being adopted by industry leaders. Ready to help define how AI learns and is evaluated in simulated environments? All applicants will receive a response.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Education Level
Ph.D. or professional degree