AI Researcher - Reinforcement Learning

1X•San Carlos, CA

4h•$200,000 - $300,000•Onsite

About The Position

1X is building humanoid robots designed to perform household chores and tasks, aiming to give people more free time. This involves solving complex challenges in robotics, AI, and manufacturing simultaneously, at scale, and within a safe, family-friendly form factor. The company has been developing these robots since 2014 and is now focused on shipping its flagship product, NEO, a home robot designed to move, learn, and operate alongside people in real-world environments. The company is seeking individuals inspired by this mission who want to contribute to a product that will genuinely change how humans spend their time by creating abundance safely for all. The Reinforcement Learning team is responsible for teaching NEO new capabilities by training policies for manipulation and locomotion tasks. This involves working across simulation and real-world environments, with a focus on the intersection of algorithm development, sim-to-real transfer, and production deployment. The team's success is measured by the reliable performance of trained policies on physical robots in homes. The role involves owning the entire pipeline from RL algorithm development to production deployment, including training NEO on manipulation and locomotion tasks in simulation, bridging the sim-to-real gap, and shipping reliable policies for real-world home environments. This is critical-path work, as the robot's capabilities are directly dependent on the quality of the RL policies developed. The role requires close collaboration with hardware, controls, data collection, and QA teams, with impact measured by the robot's performance in the field.

Requirements

Strong Python and/or C++ with experience in large codebases and build tools (Bazel or equivalent)
Proficiency with PyTorch for RL policy training and experimentation
Hands-on experience with simulation platforms (Isaac Sim, MuJoCo, or equivalent) for policy training at scale
Demonstrated experience training RL policies for manipulation or locomotion tasks, including addressing the sim-to-real gap on physical hardware
Sim-to-real practitioner closing the sim-to-real gap on physical systems; understands domain randomization, reward shaping, and the engineering required to make simulated policies transfer reliably to real hardware
RL algorithms depth with strong foundation in RL algorithms (PPO, SAC, TD-MPC, or similar); can choose the right approach for the task and modify or extend it when standard methods fall short
Full-stack ownership owning data engineering, model architecture, and deployment; treats a promising training curve as the beginning of the job, not the end
Effective cross-functional partner working closely with hardware, controls, QA, and data teams to translate RL research into deployed robot skills, and communicates technical constraints clearly across disciplines

Nice To Haves

Experience with model-based RL or world-model-guided policy learning that leverages predictive models to improve sample efficiency
Familiarity with imitation learning or learning from demonstration (behavior cloning, GAIL, IQL) as a complement or bootstrap to RL
Experience deploying RL-trained policies to physical robots in production environments, including monitoring, failure analysis, and iterative improvement
Background in legged locomotion, dexterous manipulation, or contact-rich control for physical systems

Responsibilities

Train and deploy RL policies for manipulation and locomotion tasks that perform reliably in real-world home environments measured by field task success rates, not just simulation benchmarks
Advance sim-to-real transfer techniques that measurably narrow the gap between simulation training performance and real-world policy behavior, enabling faster iteration cycles
Build training and evaluation infrastructure that lets the team iterate on policies faster with standardized benchmarks, automated regression detection, and clear connections between training metrics and field performance
Partner with hardware, controls, data, and QA teams to ship RL-trained skills to production customer sites, owning the handoff from research to deployment

Benefits

Comprehensive medical, dental, and vision coverage
Generous paid time off, company holidays, and parental leave
401(k) plan with company match (100% on the first 3% of contributions, 50% on the next 2%)
Flexible Spending Accounts (FSA) and Health Savings Accounts (HSA) options
Commuter benefits (transit and parking)
Short-term and long-term disability, and life insurance
Employee Assistance Program (EAP) for mental health, financial, and personal support
Onsite snacks and catered lunches

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume