Annapurna Labs, an integral part of AWS, develops critical hardware and software components for EC2 infrastructure, specializing in optimizing the AWS customer experience through the design of software, systems, and chips. The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium, a key initiative powering frontier AI models. This role involves deep optimization of compute for specific topologies used in modern LLMs, working closely with the hardware team to maximize performance using C/C++, interfacing with DMA and firmware, and analyzing detailed topologies. The engineer will analyze current collective algorithms using tools like Neuron Explorer, optimize them for compute and bus bandwidth utilization, and contribute to scaling AI training at AWS.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior