Member of Technical Staff - Alignment Lead

Reflection

45d

About The Position

Drive the entire alignment stack, spanning instruction tuning, RLHF, and RLAIF, to push the model toward high factual accuracy and robust instruction following. Lead research efforts to design next-generation reward models and optimization objectives that significantly improve human preference (HP) performance. Curate high-quality training data and design synthetic data pipelines that solve complex reasoning and behavioral gaps. Optimize large-scale RL pipelines for stability and efficiency, ensuring rapid iteration cycles for model improvements. Collaborate closely with pre-training and evaluation teams to create tight feedback loops that translate alignment research into generalizable model gains.

Requirements

Graduate degree (MS or PhD) in Computer Science, Machine Learning, or related discipline.
Deep technical command of alignment methodologies (PPO, DPO, rejection sampling) and experience scaling them to large models.
Strong engineering skills, comfortable diving into complex ML codebases and distributed systems.
Experience improving model behavior through data, reward modeling, or RL techniques.
Evidence of owning ambitious research or engineering agendas that led to measurable model improvements.
Thrive in a fast-paced, high-agency startup environment with bias toward action.
Passionate about advancing the frontier of intelligence.

Responsibilities

Drive the entire alignment stack, spanning instruction tuning, RLHF, and RLAIF, to push the model toward high factual accuracy and robust instruction following.
Lead research efforts to design next-generation reward models and optimization objectives that significantly improve human preference (HP) performance.
Curate high-quality training data and design synthetic data pipelines that solve complex reasoning and behavioral gaps.
Optimize large-scale RL pipelines for stability and efficiency, ensuring rapid iteration cycles for model improvements.
Collaborate closely with pre-training and evaluation teams to create tight feedback loops that translate alignment research into generalizable model gains.

Benefits

Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume