About The Position

This role is focused on building the backbone of Generative AI cloud at AWS, shaping the future of the cloud for AI training and inference. The position involves delivering continuous price-performance improvements for large-scale AI model training. The team designs, delivers, and operates AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads. The ideal candidate is an innovative self-starter knowledgeable of the full technical stack from baremetal server hardware to userland software, with a strong interest in cloud scale and how systems and software decisions impact users. They should be an excellent systems debugger, a leader with strong organizational, planning, and communication skills, and a builder.

Requirements

  • Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
  • Experience with tools for automation (building, testing, releasing or monitoring)

Nice To Haves

  • Knowledge of and proficiency in the use of Python scripting language
  • Experience on highly concurrent, high throughput systems and knowledge of complex distributed systems

Responsibilities

  • Work with engineers across the company for delivering the next-generation AWS platforms.
  • Have a direct impact on the bottom line and the ability to deliver improvements for AWS.
  • Be part of a growing, fast-paced, and fun team.
  • Have ownership for the implementation of your work.
  • See direct product improvements based on the results of your work.
  • Solve complex architectural problems that may not be defined beforehand.
  • Own the teams systems and work proactively in identifying deficiencies.
  • Write tactical code to solve issues before they impact customers.
  • Work with your team to scale the solution.
  • Decompose big difficult server system testability, reliability and diagnosis problems into straightforward tasks, components or features that you will lead to deliver yourself and through others in parallel.
  • Use a combination of hardware, software, system designs, x86 architecture, processes, diagnosis and operations knowledge.
  • Work with a variety of job roles (SDEs, SDETs, Hardware Engineers, TPMs, Managers, Principals) and groups (AWS Hardware Engineering, EC2, other AWS services) through server conception, test, launch, and operations.
  • Drive high quality and reliability into future/new designs for AWS Accelerated server solutions for AWS Cloud.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service