Software Development Engineer, EC2 UltraServer Delivery Team

AmazonSeattle, WA
$143,700 - $194,400Onsite

About The Position

The Software Development Engineer II will design, build, and maintain cloud-based provisioning workflows for NVIDIA GB200/GB300 UltraServers, orchestrating complex multi-asset systems from infrastructure handoff to production delivery. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Manufacturing, Operations, and Program Management teams to deliver AI/ML infrastructure.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience programming with at least one software programming language

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence

Responsibilities

  • Design and architect solutions that are cross-functional to Manufacturing, Operations, and Program Management
  • Work in environments where the technology strategy is defined but the solution design is not
  • Build solutions that are stable, logical, testable, and efficient with the ability to independently make trade-off decisions
  • Investigate and develop design concepts to frame solution sets at an application and product level
  • Build cloud-based solutions using AWS native services for scaling infrastructure frameworks
  • Write high-quality, maintainable code with proper testing and code reviews
  • Develop and maintain the Multi-Asset Provisioning Service workflows for GB200 and GB300 UltraServer hosts
  • Implement automation for hardware testing, cable validation, and testing processes
  • Create observable systems with appropriate metrics and alarming
  • Execute and monitor UltraServer workflows for UltraServer provisioning
  • Troubleshoot workflow failures and coordinate with downstream teams
  • Focus on operational excellence by identifying problems and proposing solutions that improve manufacturing software
  • Work with hardware and software integrations specific to GPU clusters and AI/ML training systems
  • Manage network partition configurations for multi-node GPU clusters
  • Handle firmware validation and consistency checks across asset groups
  • Collaborate with customers and stakeholders to convert business needs into technical designs
  • Participate in code reviews and technical assessments

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service