About The Position

Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We're enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to build General Thinking Agents that perceive, reason, and act in the real world. Our lab is a small, talent-dense team with the resources and scale of Amazon. Every GPU-hour matters. The lab's research velocity is directly gated by compute — more efficient utilization means more experiments, faster iteration, and more ambitious research. We need someone who treats compute optimization not as infrastructure work, but as a research multiplier that unblocks everything else.

Requirements

  • 5+ years of experience optimizing ML training and inference workloads at scale
  • Deep proficiency in CUDA and NCCL
  • Experience writing and optimizing custom GPU kernels
  • Proficiency in Python, C++, or related systems languages
  • Experience with ML inference serving architectures (e.g., SGLang, vLLM)
  • PhD or Master's degree in computer science, electrical engineering, or related field

Nice To Haves

  • Experience with quantization techniques and post-training optimization
  • Background in Kubernetes (EKS/K8s) and distributed ML infrastructure
  • Familiarity with reinforcement learning training pipelines
  • Experience leading a small technical team and mentoring engineers
  • Proven ability to work cross-functionally and make complex technical recommendations to research stakeholders
  • Strong Unix/systems fundamentals and ability to instrument, profile, and debug at the kernel level
  • Background in ML research with published work or equivalent applied contributions
  • Willingness to step outside typical role boundaries — this role spans systems engineering, ML optimization, and applied research

Responsibilities

  • Lead efforts to maximize GPU compute efficiency across the lab's training and inference workloads.
  • Ensure the lab exceeds the S-team target of 47.5% dynamic power utilization while enabling research to scale without interruption.
  • Own the full lifecycle of compute optimization on Leviathan, the EKS-based GPU cluster.
  • Instrument and profile existing experiments to identify inefficiency patterns.
  • Develop automated systems that critique running workloads against efficiency standards.
  • Generate change requests when efficiency standards are not met, without disrupting active experiments.
  • Lead a team of 3.
  • Work cross-functionally with the RL team and GTA (General Thinking Agents) effort.
  • Scope own work, set milestones, and deliver results in time to hit the October utilization window.

Benefits

  • Full range of medical, financial, and/or other benefits
  • Equity
  • Sign-on payments
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service