Senior Machine Learning Engineer, WebIR - ML Infrastructure

Amazon•Boston, MA

10d

About The Position

Unlock the future of AI at Amazon AGI (Artificial General Intelligence). At Amazon, we're at the forefront of transformative AI, shaping the next generation of intelligent technologies. For over 25 years, we've developed state-of-the-art AI solutions that transform how businesses serve their customers. Today, as AI stands ready to reshape society, we're pushing beyond current breakthroughs in generative AI toward the next frontier. Join our team of scientists, engineers, and experts to help define the future of artificial intelligence. AGI is dedicated to pushing the boundaries of what's possible, using Amazon's unparalleled ML infrastructure, computing resources, and commitment to responsible AI principles. We're looking for the brightest minds from a wide range of backgrounds and experiences to help create transformative AI solutions that will improve lives, solve global challenges, and open up new realm of possibility, from reinventing commerce and accelerating enterprise productivity to advancing universal agents and shaping the future of robotics. We are looking for a talented Senior Machine Learning Engineer to help us develop state-of-the-art, next generation web search capabilities within Amazon AGI. A day in the life What will you do: you will work with a multidisciplinary team across multiple programs to: (i) build and automate training data generation: You will build a data pipeline for producing high-quality training data sets for our web information retrieval and ranking models, having direct and significant impact on our search quality. You will help improve the data quality, including mining for hard negatives, incorporating dimensions of quality (e.g. relevance, content freshness, page trustworthiness, etc.), as well as scaling the pipeline to billions of examples. You will work closely with scientists to address their specific modeling needs and help develop metrics to communicate your progress on data quality and scale. (ii) accelerate experimental velocity: develop leveraged systems to enable the team to experiment faster. This includes centralizing evaluation workflows and developing tools and systems to streamline production model optimization and deployment. (iii) improve system understandability: develop advanced analytics and automate failure space analysis processes to help the team debug and understand search quality issues. You will partner with the broader AGI analytics team to coordinate metrics for our information retrieval engine with user signals or downstream dependencies to debug across systems as well. (iv) push model performance to limits; optimize model inference to maximize hardware utilization, reducing GPU inference latency, balancing trade-offs with quality for performance.

Requirements

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
3+ years of experience with technologies such as AWS, Glue, Redshift, Airflow, Spark, Kafka, Kubernetes, Redis
5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems
Experience in machine learning, data mining, information retrieval, statistics or natural language processing
2+ years of building large-scale machine-learning infrastructure for online recommendation, ads ranking, personalization or search experience
2+ years of experience with technologies such as Sagemaker,Triton, Pytorch, Onnx, MLFlow, Flyte

Responsibilities

build and automate training data generation
accelerate experimental velocity
improve system understandability
push model performance to limits; optimize model inference to maximize hardware utilization, reducing GPU inference latency, balancing trade-offs with quality for performance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume