RESEARCHER, POST-TRAINING

MakerMaker•San Francisco, CA

12d•Onsite

About The Position

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site. You'll lead our work on model post-training: supervised fine-tuning, preference data, reinforcement learning from human and AI feedback, reward modeling, and the evaluation suites that tell us what's actually working. You'll own a research area that meaningfully shapes our model behavior and capability. This is a hands-on senior research role. You'll set direction, run experiments, and ship into production. You'll partner with the data, infrastructure, and engineering teams to make the post-training pipeline reliable and fast: improvements there compound into every model we ship.

Requirements

Strong track record of post-training research (SFT, RL, reward modeling) at a frontier-model lab or equivalent
5+ years of hands-on ML research experience
Comfort with large-scale data curation and preference-data pipelines
Experience designing evaluation suites for capabilities that aren't easily benchmarked
Fluent in PyTorch or equivalent; comfortable at the scale of distributed training
Strong statistical instincts: you'd notice a flawed comparison before someone else points it out
Strong written communication

Nice To Haves

PhD in ML, statistics, CS, or adjacent
Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues
Experience with reward hacking detection, scaling reward models, or RLHF infrastructure
Synthetic data generation experience
Background in RL math (policy gradients, importance sampling, off-policy methods)
Open-source contributions to post-training infrastructure

Responsibilities

Lead post-training research: SFT, RLHF/RLAIF, RLVR, DPO and successor methods, reward modeling, preference data design
Design and curate the data that goes into post-training (from sourcing, to filtering, to quality assessment)
Build and maintain the evaluation suites that measure what matters; resist Goodharting your own benchmarks
Run rigorous experiments (controls, ablations, statistical significance) and write up internal findings clearly
Scale data pipelines and the infrastructure team to scale training
Identify and characterize failure modes (reward hacking, distribution drift, eval saturation) and design experiments to address them
Stay current on the post-training literature; bring useful methods in, ignore the noise

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume