About The Position

Annapurna Labs, an integral part of AWS, develops critical hardware and software components for EC2 infrastructure, specializing in optimizing the AWS customer experience through the design of software, systems, and chips. The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium, a key initiative powering frontier AI models. This role involves deep optimization of compute for specific topologies used in modern LLM training, close collaboration with the hardware team, and pushing for maximum performance using C/C++, interfacing with DMA and firmware, and investigating detailed topologies. You will analyze current collective algorithms using tools like Neuron Explorer, optimize them to fully utilize compute and bus bandwidth for data center scaling, and impact AI training at AWS scale while growing your technical expertise.

Requirements

  • Experience building complex software systems that have been successfully delivered to customers
  • Experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
  • Bachelor's degree in computer science or equivalent
  • Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
  • Experience in development in the last 3 years, or experience in embedded development in C/C++

Nice To Haves

  • Master's degree in computer science or equivalent
  • Experience with hardware/software integration and real-time systems
  • Familiarity with collective communication algorithms (e.g., all-reduce, all-gather) or distributed training frameworks

Responsibilities

  • Enhance collective algorithms and topologies for optimal training performance
  • Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization
  • Monitor and analyze processor, DMA, firmware, and workload metrics
  • Optimize collective operations to scale AI compute across the data center
  • Work closely with the hardware team to co-optimize software and Trainium silicon
  • Develop and optimize C/C++ implementations of collective communication patterns
  • Investigate and implement improvements for specific training topologies used by modern LLMs
  • Build and maintain analysis frameworks and automation solutions

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service