Research, Post-Training

CognitionSan Francisco, CA
Remote

About The Position

We are an applied AI lab building end-to-end software agents. We're the team behind Devin, the first AI software engineer, and Windsurf, an AI-native IDE. These products represent our vision for AI that doesn't just assist engineers, but works alongside them as a genuine teammate. Our team is small and talent-dense: world-class competitive programmers, former founders, and researchers from the frontier of AI, including Scale AI, Palantir, Cursor, Google DeepMind, and others. Post-training is the critical bridge between raw model capability and a system that is actually useful, safe, and effective in the real world. You will shape how our agents learn by iterating on training recipes, evaluations, and alignment methods that directly determine what Devin and our future systems can do. This role blends deep research and hands-on engineering. We don't distinguish between the two.

Requirements

  • A track record of advancing ML systems through post-training, alignment, or related methods: RLHF, RLAIF, preference modeling, reward learning, or equivalent
  • Strong fundamentals in probability, statistics, and ML theory.
  • The ability to look at experimental data and distinguish real effects from noise and bugs
  • Evidence of original contributions: publications at top venues, open-source impact, or equivalent industry results
  • Experience with large-scale distributed training and the debugging that comes with it
  • Systems-level thinking: not just model optimization, but understanding how training pipelines, data, and evaluation interact
  • Comfort with ambiguity and fast-moving research environments where priorities shift quickly

Responsibilities

  • Iterate on the full stack of datasets, training stages, and hyperparameters that determine model behavior.
  • Measure how choices compound across evals and production performance, not just isolated benchmarks.
  • Build evals that actually capture what matters.
  • Define, optimize, realize the gaps, and rebuild evaluation systems.
  • Dig until you understand why training produces results that don't make sense, and carry that understanding forward.
  • Apply and advance techniques like RLHF, RLAIF, and constitutional approaches to shape how agents reason, act, and collaborate with humans in long-horizon tasks.
  • Measure how performance scales with data and compute, and develop new methodologies when existing ones hit ceilings.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service