Head of Data Quality - RL Gyms

Turing•San Francisco, CA

About The Position

Turing is looking for a Head of Data Quality, RL Environments to build and lead the quality function for all reinforcement learning (RL) environment and trajectory data used to train and evaluate models at frontier AI labs. You will manage a team of Data Quality Leads who operate like researchers in a frontier AI lab—designing tasks, stress tests, and evaluation protocols for complex RL environments (simulated, real-world, and tool-based). Your role is to set the bar for what “high-quality RL environment data” means and ensure our environments, trajectories, rewards, and evaluations are robust, diverse, and aligned with cutting-edge GenAI and RL research. You’ll bring together: Deep understanding of RL environments, agents, and trajectories, Prior experience with ML/AI / RL / GenAI systems, and Strong organizational and people leadership to create a research-grade quality organization for RL environments and agent interaction data.

Requirements

Bachelor’s degree in Computer Science, Mathematics, Engineering, or a related field; or equivalent practical experience.
Strong technical background, including experience with: Python as a primary language RL or simulation frameworks (e.g., OpenAI Gym / Gymnasium–style APIs, custom simulators, or game engines)
7+ years total experience in software engineering, ML/AI, RL, simulation, or related fields.
3+ years managing technical teams (e.g., research, data science, RL / simulation, data quality, or engineering).
Hands-on experience with ML/AI systems, with a strong preference for: RL, RLHF/RLAIF, or agent-like systems (tool-using, web, or embodied agents) Environment or benchmark design, or large-scale agent evaluation
Prior exposure to data annotation / human feedback / human evaluation processes, including: Designing rubrics and tasks for human raters Working with preference data or trajectory labeling
High-level understanding of modern GenAI and RL / agents trends, such as: LLM-based agents interacting with tools or environments Reward shaping, curriculum learning, and preference modeling Safety, alignment, and robustness for agents in complex environments
Strong grasp of data and environment quality principles: Environment correctness, coverage, and diversity Reward design pitfalls and reward hacking detection Human evaluation quality, calibration, and inter-rater reliability
Ability to read ML/RL/AI research papers and translate them into: New environment or task requirements Evaluation and benchmarking strategies Concrete annotation and quality-control workflows
Excellent communication and leadership skills; comfortable setting direction and making tradeoff decisions in ambiguous, fast-changing domains.

Nice To Haves

Graduate degree (MS/PhD) in Computer Science, Machine Learning, Robotics, or related field.
Experience working in or closely with a research lab or frontier AI organization focused on RL, agents, or aligned systems.
Direct experience with: Designing RL benchmarks, simulators, or environment suites RLHF/RLAIF pipelines or large-scale human feedback collection Multi-agent or multi-task environments
Familiarity with game engines or simulation platforms (e.g., Unity, Unreal, MuJoCo, Isaac, Habitat, or similar).
Background in statistics and experimental design, especially for: Human feedback experiments A/B testing of environment or reward variants
Experience in high-growth startup or similarly dynamic environments.

Responsibilities

Own the RL Environment Data Quality Vision & Strategy
Lead & Develop Data Quality Leads
Design Research-Grade Evaluation & Quality Systems for RL Environments
Translate AI & RL Research Trends into Environment and Data Requirements
Partner Across Operations, Product, and Customers
Build Tools, Processes, and Documentation

Benefits

Amazing work culture (Super collaborative & supportive work environment; 5 days a week)
Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
Competitive compensation
Flexible working hours

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume