Machine Learning Engineer, LLM Post-Training

NewsBreakMountain View, CA
Remote

About The Position

NewsBreak is seeking a hands-on Machine Learning Engineer to lead the post-training of their large language models, with a primary focus on reinforcement learning (RL). This role involves owning the entire post-training stack, including continuous pre-training (CPT), supervised fine-tuning (SFT), and RL, as well as the data preparation required for these stages. The engineer will collaborate directly with product and business teams to translate real-world use cases into training objectives and rapidly implement model improvements. This is a high-ownership position for an individual with practical experience in training models.

Requirements

  • Hands-on LLM post-training experience, including personally running CPT, SFT, and RL training with demonstrated, practical RL experience (RLHF / PPO / GRPO / DPO or similar), beyond just launching training scripts.
  • Strong data engineering for ML, with the ability to independently design data-preparation plans for a given business scenario — sourcing, cleaning, filtering, labeling strategy, and synthetic/preference data generation — to meet specific product requirements.
  • Proven large-scale GPU training ability, including training LLMs on mid-to-large GPU hardware and comfort with distributed training and debugging at scale.
  • Strong PyTorch fundamentals; working familiarity with frameworks such as Hugging Face TRL/Accelerate, DeepSpeed or FSDP, and inference engines like vLLM.
  • Solid understanding of tokenization, attention, chat templates, and common failure modes in alignment/agent training.
  • A bias toward fast iteration and business impact, with strong communication skills to work across research and product teams.

Nice To Haves

  • Experience designing reward models or rule-based verifiers for RL.
  • Experience with tool-use / agentic model training (function calling, multi-step planning).
  • Publications or open-source contributions in LLM post-training or RL.

Responsibilities

  • Lead post-training of LLMs across the full pipeline: continuous pre-training, SFT, and reinforcement learning, with RL as the primary focus (e.g., RLHF, PPO, GRPO, DPO, and related methods).
  • Design, build, and curate data for each training stage: instruction/SFT datasets, preference pairs, reward signals, on-policy rollouts, and rejection-sampled completions, and define data-preparation strategies tailored to specific business needs.
  • Partner closely with business and product stakeholders to understand their scenarios, rapidly convert requirements into training plans, and deliver targeted model capabilities on tight timelines.
  • Run large-scale training on mid-to-large GPU clusters, applying distributed-training techniques (data parallelism, FSDP, and where relevant tensor/pipeline parallelism) and tuning for throughput and stability.
  • Build and maintain evaluation and reward/verifier pipelines to measure model quality, prevent regressions, and ensure training–serving consistency.
  • Stay current with post-training research and turn promising techniques into working, production-ready code.

Benefits

  • Health, dental, and vision care for you and your family (100% coverage for employee)
  • Top-tier 401(K) plan with company matching
  • Paid time off and paid holidays
  • FSA, HSA and commuter benefits programs
  • Team activity budget
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service