AWS Trainium is deployed at scale, with millions of chips in production, used for training and inference of frontier models. AWS Neuron is the software stack for Trainium, enabling customers to run deep learning and generative AI workloads with optimal performance and cost efficiency. AWS Neuron is hiring a Principal Technical Product Manager to define and drive product strategy for training software on Trainium. This includes distributed training libraries, post-training workflows (RLHF, DPO, fine-tuning), reinforcement learning frameworks, and training performance optimization. Your mission is to enable researchers and operators to train frontier models at scale on Trainium, from single-node experimentation to distributed training across thousands of nodes. You will be the champion inside AWS for frontier model builders pushing the bounds of scale and resilience for current and emerging training paradigms. You will work with customers inside and outside the company to identify key improvements and stay ahead of the training landscape. You will define how Neuron supports the training AI/ML ecosystem and what tools customers will use for their training workflows on Trainium. To be successful, you will partner with engineering teams building training libraries and distributed training infrastructure, applied scientists developing optimization techniques, and PMs responsible for compiler, runtime, NKI, and infrastructure. You will develop deep knowledge of AI/ML training architectures, distributed training systems, model parallelism strategies, and training performance optimization to effectively define product strategy and make informed technical decisions. The Ideal Candidate The ideal candidate will have solid understanding of large-scale model training, distributed training architectures, post-training workflows, and reinforcement learning. They should be able to assess technical implications of training software stack decisions, understand customer needs, and drive developer experience improvements. The ideal candidate can navigate ambiguity in a fast-moving, early-stage initiative, balance competing priorities across multiple workstreams, and drive alignment across engineering and science stakeholders with excellent written and verbal communication abilities
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal