About The Position

We build frontier foundation models that power intelligent experiences at Apple. Our team works across the full training lifecycle: including pre-training foundation models, and developing mid-training approaches that bridge general capability and task-specific performance. What makes our work distinct is that we're engineering models specifically for Apple silicon and optimized for experiences that are private, personal, and deeply integrated into the OS. We're solving frontier problems in reward modeling to resist reward hacking, handling sparse and delayed rewards in agentic settings, and aligning models reliably across the spectrum from open-ended creative tasks to precise, action-taking workflows. If you're drawn to hard problems where the research and the product are inseparable, this is the team. We are building the next generation of models optimized for Agentic, Reasoning, and Coding capabilities. This means training models via RL to reason from first principles, building autonomous coding agents that operate in real repositories, and developing agentic systems that handle multi-step workflows with error recovery. You will work on problems like: RL with verifiable rewards for mathematical reasoning, multi-turn RL for coding agents evaluated on SWE-Bench and beyond, scaling laws for RL compute allocation, progressive alignment across capability stages, and training models to manage their own context in long-horizon tasks. This is applied research with direct product impact — your work will ship to millions of users.

Requirements

  • Demonstrated expertise in deep learning with publications at top ML or NLP conferences, or a track record of applying deep learning techniques to products
  • Proficient programming skills in Python and one of the deep learning toolkits such as JAX, PyTorch, or Tensorflow
  • Ability to work in a collaborative environment.
  • PhD, or equivalent practical experience, in Computer Science, or related technical field.

Nice To Haves

  • Reinforcement learning for LLMs: RLHF, GRPO, PPO, RLVR, reward modeling, RL scaling laws
  • Code generation and coding agents: repository-level code understanding, agentic coding
  • Agentic systems: multi-turn RL, tool-use planning, long-horizon task execution, user simulation
  • Distillation and alignment: on-policy distillation, reward-tilted distillation, cross-stage distillation to combine independently optimized capabilities into a single model
  • Long context and efficiency: sparse attention, context compression, scaling to very long context windows

Responsibilities

  • Train models via RL to reason from first principles.
  • Build autonomous coding agents that operate in real repositories.
  • Develop agentic systems that handle multi-step workflows with error recovery.
  • Work on problems like: RL with verifiable rewards for mathematical reasoning, multi-turn RL for coding agents evaluated on SWE-Bench and beyond, scaling laws for RL compute allocation, progressive alignment across capability stages, and training models to manage their own context in long-horizon tasks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Ph.D. or professional degree

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service