Member of Technical Staff, GPU Compute Optimization, AGI Lab

Amazon•San Francisco, CA

1d•Onsite

About The Position

Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We're enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to build General Thinking Agents that perceive, reason, and act in the real world. Our lab is a small, talent-dense team with the resources and scale of Amazon. Every GPU-hour matters. The lab's research velocity is directly gated by compute — more efficient utilization means more experiments, faster iteration, and more ambitious research. We need someone who treats compute optimization not as infrastructure work, but as a research multiplier that unblocks everything else.

Requirements

5+ years of experience optimizing ML training and inference workloads at scale
Deep proficiency in CUDA and NCCL
Experience writing and optimizing custom GPU kernels
Proficiency in Python, C++, or related systems languages
Experience with ML inference serving architectures (e.g., SGLang, vLLM)
PhD or Master's degree in computer science, electrical engineering, or related field

Nice To Haves

Experience with quantization techniques and post-training optimization
Background in Kubernetes (EKS/K8s) and distributed ML infrastructure
Familiarity with reinforcement learning training pipelines
Experience leading a small technical team and mentoring engineers
Proven ability to work cross-functionally and make complex technical recommendations to research stakeholders
Strong Unix/systems fundamentals and ability to instrument, profile, and debug at the kernel level
Background in ML research with published work or equivalent applied contributions
Willingness to step outside typical role boundaries — this role spans systems engineering, ML optimization, and applied research

Responsibilities

Lead efforts to maximize GPU compute efficiency across the lab's training and inference workloads.
Ensure the lab exceeds the S-team target of 47.5% dynamic power utilization while enabling research to scale without interruption.
Own the full lifecycle of compute optimization on Leviathan, the EKS-based GPU cluster.
Instrument and profile existing experiments to identify inefficiency patterns.
Develop automated systems that critique running workloads against efficiency standards.
Generate change requests when efficiency standards are not met, without disrupting active experiments.
Lead a team of 3.
Work cross-functionally with the RL team and GTA (General Thinking Agents) effort.
Scope own work, set milestones, and deliver results in time to hit the October utilization window.