Frontier Data Lead, RL Gyms - US

Turing•San Francisco, CA

10h•Remote

About The Position

Turing is seeking a Staff Research Engineer, RL Gyms to design, build, and iterate on Reinforcement Learning (RL) environments used by frontier AI labs for model evaluation and improvement. This is a hands-on, deeply technical individual contributor role where you will be responsible for the architecture and implementation of complex, realistic environments simulating real-world software systems and business workflows. You will collaborate directly with researchers at leading AI labs to translate their post-training objectives into environment specifications and build the infrastructure for these environments to run at scale. You will also influence technical direction for environment design patterns, reward and verifier systems, and data generation pipelines across Turing.

Requirements

Deep RL & post-training expertise: Hands-on experience with RL fine-tuning, reward/verifier design, environment design, or RLHF pipelines. Understanding of the training loop end-to-end.
Strong software engineering fundamentals: Ability to write clean, reliable, production-grade code.
Python and SQL proficiency required.
Experience designing database schemas and API interfaces for complex systems.
Systems thinking: Ability to decompose real-world applications into faithful simulated environments with realistic state, transitions, and edge cases.
Experimentation rigor: Comfort running training experiments, interpreting reward curves, diagnosing environment issues from model behavior, and iterating rapidly based on results.
Technical influence without authority: Track record of raising engineering quality across teams through design review, best practices, and mentorship.
Strong communicator with researchers: Ability to engage deeply with ML researchers, understand their objectives, and translate between research goals and engineering implementation.
Background in Computer Science, Machine Learning, or related technical field required.

Nice To Haves

Advanced degree (MS/PhD) in a relevant area preferred.

Responsibilities

Design and build RL environments that simulate real-world applications, defining database schemas, API interfaces, seed data, task distributions, and verifier logic.
Write production code daily alongside project teams.
Architect reward functions and verification systems that accurately measure agent performance on complex, multi-step workflows, ensuring they are well-calibrated, resistant to reward hacking, and aligned with client training goals.
Set design patterns and best practices for environment construction across multiple concurrent projects, establishing reusable frameworks, reviewing architectures, and raising the technical bar for environment quality, realism, and difficulty.
Define and enforce quality standards for environments and their data (agent trajectories, reward scores, task distributions), building automated validation, difficulty calibration, and diversity analysis tooling.
Work directly with researchers at frontier AI labs to understand post-training goals, translate them into environment requirements, and iterate based on experimental results, serving as the primary technical counterpart.
Run in-house RL fine-tuning experiments and eval benchmarks to measure model performance lifts on environments, using results to refine environment design and demonstrate value to clients.
Raise the capabilities of engineers on project teams through code review, architecture guidance, and pair programming.

Benefits

Competitive compensation
Flexible working hours
Equity
Amazing work culture (super collaborative & supportive work environment; 5 days a week)
Awesome colleagues (surround yourself with top talent from Meta, Google, LinkedIn, etc. as well as people with deep startup experience)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume