Senior ML Engineer, Fauna

Amazon•New York, NY

About The Position

We are seeking a Senior ML Engineer to build and scale the machine learning systems that power our intelligent robots. In this role, you will design and maintain the infrastructure for training, evaluating, and deploying the ML models that enable robot locomotion, perception, manipulation, navigation, and human-robot interaction. You'll work at the intersection of machine learning and systems engineering, ensuring our ML training and deployment systems are robust, efficient, and scalable as we grow from prototype to production.

Requirements

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Bachelor's degree or above in computer science, machine learning, engineering, or related fields, or Master's degree
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience in development in the last 3 years
Experience with machine learning (ML) tools and methods
Experience in Kubernetes, Docker or containers ecosystem, or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience with programming/scripting (Batch, VB, PowerShell, Java, C#, Chef, Perl, Ruby and/or PHP)

Nice To Haves

Experience building and operating a cloud-based architecture
Experience with robotics data (sensor streams, video, point clouds) and real-time inference systems
Familiarity with model optimization techniques (quantization, pruning, distillation)
Experience with reinforcement learning or simulation-based training pipelines

Responsibilities

Design and build scalable ML training infrastructure, including distributed training pipelines and GPU cluster management both in the cloud and on-prem
Develop systems for experiment tracking, model versioning, and reproducibility
Build deployment infrastructure for serving ML models on robotic hardware with strict latency requirements
Optimize model inference for edge devices and embedded systems
Collaborate with research teams to accelerate the path from experimentation to production
Contribute to data pipelines and labeling infrastructure as needed, in partnership with the data platform team

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume