Amazon's Intelligent Cloud Hosting (ICON) team is looking for a Software Development Engineer (SDE) to join our team. ICON is responsible for the reliability and operational excellence of Amazon's cloud hosting infrastructure, supporting all of Amazon's global marketplaces, partner portals, and consumer experiences including Kindle, Alexa, Amazon Video, and the Mobile Application. The team builds intelligent systems that proactively detect, diagnose, and resolve incidents across hundreds of thousands of services powering one of the world's largest distributed architectures. The challenges SDEs solve on this team are high-impact and mission-critical. The team is building AI-powered incident response systems that automatically investigate production issues, identify root causes from metrics, logs, and deployment events, and recommend mitigations to on-call engineers. These systems operate at massive scale, processing thousands of signals per investigation and reducing mean-time-to-resolution for critical production incidents. As an SDE II on the team, you will: Design and build production generative AI workflow that automate incident investigation workflows, from alert ingestion through root-cause analysis to mitigation recommendations. Work on tier-1, multi-tenant, high-performance systems built on AWS services (Step Functions, Bedrock, DynamoDB, Athena) with technical challenges unique to this kind of scale and throughput. Build developer productivity and operational tooling including orchestration, predictive analytics, automated diagnosis, and self-healing systems. The team is looking for engineers who are passionate about applying generative AI and machine learning to operational problems, thrive in ambiguous environments, and want to build systems that keep Amazon's infrastructure running for millions of customers worldwide.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level