Software Development Engineer, ML Systems, Annapurna Labs

Amazon•New York, NY

1d•Onsite

About The Position

The Neuroboros team, part of Amazon Annapurna Labs within AWS UC, is focused on leveraging and expanding Generative AI technologies to benefit customers through Amazon's Machine Learning hardware. This role is key to Annapurna Labs' strategy of creating a new hub in NYC to attract top talent for challenging problems using state-of-the-art tooling. Amazon Annapurna Labs builds innovation in silicon and software for AWS customers, improving cloud infrastructure in high-performance machine learning with AWS Neuron, Inferentia, and Trainium ML chips, as well as in networking and computing. AWS is the world's most comprehensive cloud platform. AWS Neuron is the software for Trainium and Inferentia chips, delivering high-performance ML inference and training at the lowest cost. Neuron includes an ML compiler and integrates with popular ML frameworks, used by external customers like Anthropic and Databricks, and internal customers like Alexa and Amazon Bedrock. This role involves applying AI to AI to simplify and accelerate customer adoption of Neuron, the software stack for Amazon's Trainium ML silicon. You will work on building agents, tools, and models, partnering with customers to identify obstacles and opportunities for migration to AWS's ML silicon, and driving impact through AI agents and tools critical to AWS's Generative AI business.

Requirements

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language
Computer Science core: object-oriented design, data structures, and performance analysis with at least 2 programming languages.
Experience in one or more of the following areas: ML compilers, production coding agents, GenAI model architecture, model training, neural network optimization, or alternatively applied math.

Nice To Haves

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
2+ years in machine learning or other computational modeling environments with an emphasis on hosting, building or optimizing models for diverse hardware platforms
Proven track record in building AI agents that automate ML workload optimization, ML compiler tuning, distributed inference and training, or ML kernel authoring and optimization
Experience working with open-source software communities in the optimization space or related areas
Knowledge of the state-of-the-art technology used in the Machine Learning space and its mathematical underpinning

Responsibilities

Research implementations that deliver the best possible experiences for customers.
Deliver on goals to improve the time and effort it takes to port and optimize Machine Learning workloads on Neuron.
Solve challenging technical problems, often ones not solved before, at every layer of the stack.
Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
Build high-quality, highly available, always-on products.
Potentially contribute intellectual property through patents.

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave
sign-on payments
restricted stock units (RSUs)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume