Machine Learning Engineer II, Amazon Music - AI and Personalization

Amazon•Seattle, WA

18d

About The Position

Amazon Music is an immersive audio entertainment service that deepens connections between fans, artists, and creators. From personalized music playlists to exclusive podcasts, concert livestreams to artist merch, Amazon Music is innovating at some of the most exciting intersections of music and culture. We offer experiences that serve all listeners with our different tiers of service: Prime members get access to all the music in shuffle mode, and top ad-free podcasts, included with their membership; customers can upgrade to Amazon Music Unlimited for unlimited, on-demand access to 100 million songs, including millions in HD, Ultra HD, and spatial audio; and anyone can listen for free by downloading the Amazon Music app or via Alexa-enabled devices. Join us for the opportunity to influence how Amazon Music engages fans, artists, and creators on a global scale. Learn more at https://www.amazon.com/music We are seeking a Machine Learning Engineer to join the Amazon Music AI and Personalization team and drive model training efficiency and inference optimization improvements. In this role, you will work at the intersection of machine learning and systems engineering, ensuring our models train faster, cost less, and run efficiently in production environments. You will collaborate closely with research scientists, platform engineers, and product teams to deliver scalable, high-performance ML solutions that help customers discover great new products and save money on products that they are evaluating.

Requirements

Bachelor's degree in computer science or equivalent
3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Experience in machine learning, data mining, information retrieval, statistics or natural language processing
Experience programming with at least one modern language such as Java, C++, or C# including object-oriented design
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience building complex software systems that have been successfully delivered to customers
Experience with Machine and Deep Learning toolkits such as MXNet, TensorFlow, Caffe and PyTorch
Experience in production, monitoring and metrics reporting
Experience building, deploying, and maintaining large-scale machine learning infrastructure using distributed data processing frameworks such as Spark or Ray
Experience owning and operating production services, including on-call responsibilities, incident management, and operational metrics

Nice To Haves

Master's degree in computer science or equivalent
Expertise in large-model inference optimization, including techniques such as quantization, pruning, and distillation
Demonstrated experience designing semantic search or RAG pipelines, integrating embeddings, vector stores, and generative models
Proficiency in online and offline experimentation, evaluation frameworks, and metrics instrumentation for ML systems
Experience with service-oriented architectures, microservices design patterns, and managing service dependencies in complex ML systems
Strong collaboration and communication skills, with the ability to bridge science and engineering to deliver end-to-end ML solutions

Responsibilities

Design and implement strategies to improve training throughput and reduce time-to-convergence
Profile and eliminate bottlenecks in data loading, preprocessing, and model computation
Develop and maintain training infrastructure that scales efficiently with model and dataset size
Optimize models for low-latency, high-throughput production inference
Implement and benchmark inference optimizations across various hardware targets (GPU, CPU, edge devices)
Establish performance benchmarks and monitoring for inference pipelines
Own production services that support ML decision models, including ranking services, orchestration layers, and model-serving infrastructure
Participate in on-call rotation to ensure service reliability, respond to operational issues, and drive continuous improvement
Design and implement monitoring, alerting, and observability solutions for ML services to proactively identify and resolve issues
Manage service dependencies, API contracts, and integration points between ML models and downstream systems
Drive operational excellence through automation, runbook development, and post-incident reviews
Partner with research teams to understand model architectures and identify optimization opportunities
Collaborate with Science/ML teams on service integration points and ownership boundaries for ML components
Contribute to best practices and tooling for ML efficiency across the organization
Evaluate emerging hardware and software technologies for potential adoption

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume