SWE (RL Environments)

Recruiting From Scratch•San Francisco, CA

5d•Onsite

About The Position

Our client is building the training data and evaluation infrastructure powering frontier AI labs. They work directly with top AI companies including OpenAI, Meta, DeepMind, and other frontier model organizations. The company reached $100M ARR in under 18 months and recently raised a $30M Series A. They specialize in high-signal datasets, evaluation infrastructure, RLHF/RLVR pipelines, and agentic AI training systems. Extremely talent-dense team with backgrounds from Citadel, Palantir, NVIDIA, Databricks, Goldman Sachs, and leading AI startups. Small, execution-heavy environment where engineers directly shape how frontier models learn and improve. This is an opportunity to work at the frontier of reinforcement learning, evaluation systems, synthetic data, and AI experimentation infrastructure.

Requirements

1–6 years of software engineering experience
Explicit hands-on experience building reinforcement learning environments
Strong backend or fullstack engineering background
Strong Python engineering skills
Experience building AI infrastructure, evaluation systems, or simulation environments
Experience with RLHF, RLVR, supervised fine-tuning, or model evaluation workflows
Strong systems-thinking and quantitative reasoning ability
Experience building production-quality experimentation or benchmarking frameworks
Comfortable working across data pipelines, infrastructure, and backend systems
Experience at high-growth startups, AI companies, quant firms, or research-heavy environments
Ability to move quickly and operate autonomously in ambiguous environments
Strong ownership mentality with bias for action and execution
Comfortable doing difficult, tedious, and highly iterative engineering work
Strong CS fundamentals and systems engineering capability

Nice To Haves

Explicit RL environment development experience in production
Experience at RL-focused AI startups or evaluation infrastructure companies
Experience building simulations, benchmark systems, or agentic AI evaluation frameworks
Strong side projects, published AI papers, or open-source contributions
Experience with RLHF, RLVR, synthetic data, or alignment tooling
Background from top AI startups, quant firms, or elite engineering organizations
Experience building fast experimental systems with strong iteration speed
Experience with data quality measurement and evaluation metrics
Strong backend engineering depth combined with AI systems exposure
Experience working directly with researchers or model training teams
Founder or early startup engineering experience
Experience building complex AI infrastructure from scratch
Track record of exceptional execution speed and technical ownership
Top-tier university background in CS, engineering, math, or related fields

Responsibilities

Build reinforcement learning environments used to train and evaluate frontier AI systems
Design datasets and evaluation rubrics that expose meaningful model failure modes
Develop RLHF and RLVR reward signals and experimentation frameworks
Create scalable pipelines for real-world and synthetic data generation
Build quantitative frameworks for measuring dataset quality, diversity, and downstream model impact
Design simulations and environments across domains like coding, finance, enterprise workflows, and reasoning
Partner directly with frontier AI lab researchers on training objectives and evaluation methodologies
Rapidly prototype and ship experimental infrastructure and tooling
Diagnose model weaknesses and develop environments that improve model capabilities
Work on backend-heavy AI infrastructure and experimentation systems
Develop scalable evaluation and benchmarking systems for agentic AI workflows
Iterate quickly from hypothesis to production experiments
Build V1 systems independently with high ownership and minimal process overhead
Operate in a highly execution-focused startup environment with strong technical intensity