Annapurna Labs, an integral part of AWS, develops critical hardware and software components for EC2 infrastructure, specializing in optimizing the AWS customer experience through the design of software, systems, and chips. The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium, a key initiative powering frontier AI models. This role involves deep optimization of compute for specific topologies used in modern LLM training, close collaboration with the hardware team, and pushing for maximum performance using C/C++, interfacing with DMA and firmware, and investigating detailed topologies. You will analyze current collective algorithms using tools like Neuron Explorer, optimize them to fully utilize compute and bus bandwidth for data center scaling, and impact AI training at AWS scale while growing your technical expertise.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level