RESEARCHER, POST-TRAINING

MakerMakerSan Francisco, CA
Onsite

About The Position

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site. You'll lead our work on model post-training: supervised fine-tuning, preference data, reinforcement learning from human and AI feedback, reward modeling, and the evaluation suites that tell us what's actually working. You'll own a research area that meaningfully shapes our model behavior and capability. This is a hands-on senior research role. You'll set direction, run experiments, and ship into production. You'll partner with the data, infrastructure, and engineering teams to make the post-training pipeline reliable and fast: improvements there compound into every model we ship.

Requirements

  • Strong track record of post-training research (SFT, RL, reward modeling) at a frontier-model lab or equivalent
  • 5+ years of hands-on ML research experience
  • Comfort with large-scale data curation and preference-data pipelines
  • Experience designing evaluation suites for capabilities that aren't easily benchmarked
  • Fluent in PyTorch or equivalent; comfortable at the scale of distributed training
  • Strong statistical instincts: you'd notice a flawed comparison before someone else points it out
  • Strong written communication

Nice To Haves

  • PhD in ML, statistics, CS, or adjacent
  • Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues
  • Experience with reward hacking detection, scaling reward models, or RLHF infrastructure
  • Synthetic data generation experience
  • Background in RL math (policy gradients, importance sampling, off-policy methods)
  • Open-source contributions to post-training infrastructure

Responsibilities

  • Lead post-training research: SFT, RLHF/RLAIF, RLVR, DPO and successor methods, reward modeling, preference data design
  • Design and curate the data that goes into post-training (from sourcing, to filtering, to quality assessment)
  • Build and maintain the evaluation suites that measure what matters; resist Goodharting your own benchmarks
  • Run rigorous experiments (controls, ablations, statistical significance) and write up internal findings clearly
  • Scale data pipelines and the infrastructure team to scale training
  • Identify and characterize failure modes (reward hacking, distribution drift, eval saturation) and design experiments to address them
  • Stay current on the post-training literature; bring useful methods in, ignore the noise
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service