Sr Software Development Engineer, Neuron Collectives, Annapurna Labs

AmazonCupertino, CA
$193,300 - $261,500Onsite

About The Position

Annapurna Labs, an integral part of AWS, develops critical hardware and software components for EC2 infrastructure, specializing in optimizing the AWS customer experience through the design of software, systems, and chips. The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium, a key initiative powering frontier AI models. This role involves deep optimization of compute for specific topologies used in modern LLMs, working closely with the hardware team to maximize performance using C/C++, interfacing with DMA and firmware, and analyzing detailed topologies. The engineer will analyze current collective algorithms using tools like Neuron Explorer, optimize them for compute and bus bandwidth utilization, and contribute to scaling AI training at AWS.

Requirements

  • Bachelor's degree in computer science or equivalent
  • 5+ years of Experience building complex software systems that have been successfully delivered to customers
  • 5+ years of Experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems

Nice To Haves

  • Master's degree in computer science or equivalent
  • Familiarity with collective communication algorithms (e.g., all-reduce, all-gather) or distributed training frameworks

Responsibilities

  • Enhance collective algorithms and topologies for optimal training performance
  • Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization
  • Monitor and analyze processor, DMA, firmware, and workload metrics
  • Optimize collective operations to scale AI compute across the data center
  • Work closely with the hardware team to co-optimize software and Trainium silicon
  • Develop and optimize C/C++ implementations of collective communication patterns
  • Investigate and implement improvements for specific training topologies used by modern LLMs
  • Build and maintain analysis frameworks and automation solutions

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service