AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Amazon's global data center infrastructure generates millions of operational alarms daily, and right now most of the triage, analysis, and routing of those alarms requires significant human effort from engineering operations teams. The SKG team within DC BRIDGE is building intelligent systems that fundamentally change how data center engineers interact with alarm data: reducing noise, automating triage, and applying generative AI to surface actionable insights from complex operational signals. This is a builder role, not a maintainer role. We have a team of 8 engineers (SDE1s and SDE2s) who are talented, motivated, and ready for senior technical leadership. The technical strategy is still being defined. Generative AI hasn't been productionized yet. The opportunity is to be the person who shapes the direction, levels up the team, and gets GenAI into production for real data center operations problems. If you want to walk into a well-oiled machine and keep it running, this isn't the role. If you want to build something from a position of real influence, keep reading. Our customers are internal: data center operations engineers who work directly with the physical infrastructure. That means tight feedback loops, fast iteration cycles, and the ability to sit down with the people using your systems to understand what's actually working — real Customer Obsession, not the abstract kind. We want you to walk in with clear eyes. Strategy is still being defined — the team has strong execution capability but needs senior technical leadership to set direction. You won't inherit a roadmap, you'll build one. Input data quality from upstream systems is rough and ripe for optimization. GenAI is unproven here — you'll need to figure out what works and how to get real value from Amazon Bedrock in an operational context, not just build demos. Tech stack: TypeScript (primary for CDK and service code), Python (data processing, Lambda functions, AI service integration). AWS services include Lambda, DynamoDB, SQS, SNS, S3, CloudWatch, Amazon Bedrock, API Gateway, and Route 53. CI/CD uses Amazon's internal build, test, and deployment tooling including Hydra for integration testing and internal deployment pipelines — these aren't industry-standard tools, you'll learn them here.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level