About The Position

As a Staff Research Scientist, you will play a pivotal role in shaping the future of large language model (LLM) alignment by leading research and development at the intersection of data quality and post-training techniques such as RLHF, preference optimization, and reward modeling. You will operate at the forefront of model alignment, with a focus on ensuring the integrity, reliability, and strategic use of supervision data that drives post-training performance. You’ll set research direction, influence cross-functional data standards, and lead the development of scalable systems that diagnose and improve the data foundations of frontier AI.

Requirements

  • PhD or equivalent experience in machine learning, NLP, or data-centric AI, with a track record of leadership in LLM post-training or data quality research.
  • 5 years of academic or industry experience post-doc.
  • Deep expertise in RLHF, preference data pipelines, reward modeling, or evaluation systems.
  • Demonstrated experience designing and scaling data quality infrastructure — from labeling frameworks and validation metrics to automated filtering and dataset optimization.
  • Strong engineering proficiency in Python, PyTorch, and ecosystem tools for large-scale training and evaluation.
  • A proven ability to define, lead, and execute complex research initiatives with clear business and technical impact.
  • Strong communication and collaboration skills, with experience driving strategy across research, engineering, and product teams.

Nice To Haves

  • Experience with data valuation (e.g. influence functions, Shapley values), active learning, or human-in-the-loop systems.
  • Contributions to open-source tools for dataset analysis, benchmarking, or reward model training.
  • Familiarity with evaluation challenges such as annotation disagreement, subjective labeling, or multilingual feedback alignment.
  • Interest in the long-term implications of data quality for AI safety, governance, and deployment ethics.

Responsibilities

  • Lead high-impact research on data quality frameworks for post-training LLMs — including techniques for preference consistency, label reliability, annotator calibration, and dataset auditing.
  • Design and implement systems for identifying noisy, low-value, or adversarial data points in human feedback and synthetic comparison datasets.
  • Drive strategy for aligning data collection, curation, and filtering with post-training objectives such as helpfulness, harmlessness, and faithfulness.
  • Collaborate cross-functionally with engineers, alignment researchers, and product leaders to translate research into production-ready pipelines for RLHF and DPO.
  • Mentor and influence junior researchers and engineers working on data-centric evaluation, reward modeling, and benchmark creation.
  • Author foundational tools and metrics that connect supervision data characteristics to downstream LLM behavior and evaluation performance.
  • Publish and present research that advances the field of data quality in LLM post-training, contributing to academic and industry best practices.

Benefits

  • Equity in a fast-growing company
  • 401(k) match, competitive compensation, financial coaching
  • Paid parental leave, fertility benefits, parental coaching
  • Medical, dental, and vision, mental health support, $500 wellness stipend
  • $2,000 learning stipend, ongoing development
  • Stipends for home office setup, internet, commuting, and free lunch/gym in our SF office
  • Flexible PTO, 15 holidays + 2 flex days, winter #ShakeBreak where our whole office closes for a week!
  • Team outings & referral bonuses

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Ph.D. or professional degree

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service