Machine Learning Engineer, LLM Post-Training

NewsBreakMountain View, CA
$150,000 - $230,000Remote

About The Position

We are looking for a hands-on Machine Learning Engineer to drive the post-training of our large language models, with a strong emphasis on reinforcement learning (RL). You will own the full post-training stack — continuous pre-training (CPT), supervised fine-tuning (SFT), and RL — along with the data preparation that powers it. Just as important, you will work directly with product and business teams to translate real-world use cases into concrete training objectives and ship model improvements quickly. This is a high-ownership role for someone who has actually trained models, not just read about it.

Requirements

  • Hands-on LLM post-training experience. You have personally run CPT, SFT, and RL training — with demonstrated, practical RL experience (RLHF / PPO / GRPO / DPO or similar), beyond just launching training scripts.
  • Strong data engineering for ML. You can independently design data-preparation plans for a given business scenario — sourcing, cleaning, filtering, labeling strategy, and synthetic/preference data generation — to meet specific product requirements.
  • Proven large-scale GPU training ability. You have trained LLMs on mid-to-large GPU hardware and are comfortable with distributed training and debugging at scale.
  • Strong PyTorch fundamentals; working familiarity with frameworks such as Hugging Face TRL/Accelerate, DeepSpeed or FSDP, and inference engines like vLLM.
  • Solid understanding of tokenization, attention, chat templates, and common failure modes in alignment/agent training.
  • A bias toward fast iteration and business impact, with strong communication skills to work across research and product teams.

Nice To Haves

  • Experience designing reward models or rule-based verifiers for RL.
  • Experience with tool-use / agentic model training (function calling, multi-step planning).
  • Publications or open-source contributions in LLM post-training or RL.

Responsibilities

  • Lead post-training of our LLMs across the full pipeline: continuous pre-training, SFT, and reinforcement learning, with RL as the primary focus (e.g., RLHF, PPO, GRPO, DPO, and related methods).
  • Design, build, and curate the data that drives each training stage — instruction/SFT datasets, preference pairs, reward signals, on-policy rollouts, and rejection-sampled completions — and define data-preparation strategies tailored to specific business needs.
  • Partner closely with business and product stakeholders to understand their scenarios, rapidly convert requirements into training plans, and deliver targeted model capabilities on tight timelines.
  • Run large-scale training on mid-to-large GPU clusters, applying distributed-training techniques (data parallelism, FSDP, and where relevant tensor/pipeline parallelism) and tuning for throughput and stability.
  • Build and maintain evaluation and reward/verifier pipelines to measure model quality, prevent regressions, and ensure training–serving consistency.
  • Stay current with post-training research and turn promising techniques into working, production-ready code.

Benefits

  • Health, dental, and vision care for you and your family (100% coverage for employee)
  • Top-tier 401(K) plan with company matching
  • Paid time off and paid holidays
  • FSA, HSA and commuter benefits programs
  • Team activity budget
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service