Research Intern, Agent RL Training

NewsBreakMountain View, CA
Onsite

About The Position

NewsBreak is seeking a Research Intern to join our Agent RL Training team. This role involves exploring the application of large language models (LLMs) to NewsBreak's core business, including content understanding, recommendation, agentic web browsing, and autonomous multi-step task completion. The intern will be paired with a full-time mentor and is expected to independently drive experiments, propose novel ideas, and iterate quickly. This is a hands-on research role for self-starters with deep intellectual curiosity and a drive to push boundaries in LLM post-training and agent capabilities. The internship is located onsite in the Mountain View, CA office.

Requirements

  • Highly motivated and committed: willing to put in extra hours when needed to push projects across the finish line
  • Genuine passion for research: you read papers for fun, tinker with models on weekends, and care deeply about advancing the field
  • Independently capable of end-to-end model SFT: with basic understanding of RL-based post-training methods (RLHF, DPO, PPO, GRPO, etc.)
  • Excellent taste in model behavior: able to reason about what “good” looks like across user-facing domains and articulate why
  • Strong Python and PyTorch skills

Nice To Haves

  • Publication at a top-tier venue (NeurIPS, ICML, ICLR, ACL, EMNLP, or equivalent)
  • Experience with multi-node distributed training (FSDP, DeepSpeed, Megatron-LM)
  • Proficiency in writing custom GPU kernels with Triton or CUDA
  • Experience building synthetic data pipelines for agent training
  • Familiarity with open-source RL frameworks: TRL, OpenRLHF, veRL/vLLM

Responsibilities

  • Collaborate with your full-time mentor to identify high-impact research directions for applying LLMs to NewsBreak’s products
  • Independently run end-to-end SFT experiments on LLM-based agents, and assist with RL-related exploration such as reward design and training iteration
  • Curate and build high-quality training datasets: instruction-following, preference pairs, agent trajectories, and synthetic data
  • Contribute to public publications; we encourage and support top-venue submissions during your internship

Benefits

  • Hourly Pay: $35- $50
  • Discretionary bonus and options may be available depending on the position.
  • Great Place to Work® certified
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service