Join the EC2 Nitro Machine Learning Systems team to revolutionize supercomputing in the cloud. We're seeking an experienced Software Development Engineer to build and optimize infrastructure powering the most computationally intensive AI/ML workloads. In this role, you'll establish EC2 as the definitive source for best-known-configurations across diverse ML applications while influencing future accelerated platform designs. You'll bring deep expertise in ML systems performance, working across the full stack from low-level hardware optimization to high-level frameworks. This position offers unique opportunities to translate state of the art ML research into practical platform improvements, build foundational measurement infrastructure, and directly support customers with performance challenges. If you're passionate about solving complex performance optimization problems at massive scale while directly influencing product strategy, this role provides the perfect opportunity to make a significant impact. Your day revolves around translating technical performance data into actionable business insights while solving complex optimization challenges. You might start by analyzing performance bottlenecks in a customer's large language model training workflow, then collaborate with framework engineers to implement optimizations. Later, you'll present findings at a platform design review, where your data-driven insights directly influence future hardware decisions. Throughout the day, you'll balance immediate customer needs with long-term infrastructure development, all while helping establish processes for this bootstrap team. The EC2 Nitro Machine Learning Systems team is responsible for development, operations, and maintenance of scale-out machine learning platforms used for training and inference workloads. We build and optimize the infrastructure that powers some of the most computationally intensive AI/ML workloads in the cloud. Our team is passionate about creating reliable, high-performance systems that enable customers to push the boundaries of what's possible with machine learning. Working with us means having the opportunity to influence the future of supercomputing in the cloud while solving complex technical challenges at massive scale. We collaborate closely with customers and internal teams to continuously improve our platforms and deliver innovations that accelerate machine learning workflows.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level