About The Position

Would you like to develop the Next Generation of AI accelerator compute systems? Lead bleeding-edge HW development projects? Have you heard of Amazon Web Services (AWS) Project Rainer? This is the opportunity to be a part of a fast-moving innovation team that is changing the world of AI at massive scale. At AWS Trainium we develop a complete vertical stack system, from our own Silicon to Hardware to Software and deploy directly to our customers in our own Data Centers We are seeking experienced Lead System Design Engineers to build the next generation of our cloud server infrastructure, Project Rainier. Project Rainier is a massive $11 billion Amazon Web Services (AWS) AI infrastructure initiative, featuring one of the world's largest compute clusters dedicated to training and running Anthropic’s Claude AI models. It utilizes over 500,000 custom Trainium2 chips, designed for high-performance AI training. As a member of the AWS Trainium Machine Learning Acceleration team you’ll be responsible for the System design and optimization of hardware in our data centers. You’ll provide leadership in the application of new technologies to large scale server deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you’ll work with thought leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve your products performance, quality and cost. We’re changing industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today. We are looking for candidates who thrive in a fast-paced start-up like environment and work independently to deliver multiple projects in parallel. To be successful, you need to be highly motivated and detailed oriented while meeting the highest standards and time to market, cost and quality goals.

Requirements

  • BS or MS degree in Electrical or Computer Engineering (EE / CE)
  • Minimum of 5 years of experience with High-Speed system design and validation
  • Experience with Schematic and layout tools.
  • Drive ODM HW development and testing and be part of the Production flow definition team
  • Strong knowledge in electrical engineering fundamentals, power & signal integrity, and analog/digital circuits
  • Able to drive component selection and validation of electrical, mechanical components, cables
  • Experience with hardware development process and system development across full product life cycles
  • Experience using lab equipment such as bench power supplies, high-speed oscilloscopes, logic analyzers, spectrum analyzers, VNA’s, and thermal chambers

Nice To Haves

  • Lead end-to-end server hardware development lifecycle from Concept, Architecture, Design, Validation and Production
  • Drive PCB board design for server motherboards, accelerator carrier boards, and high-speed interconnect boards.
  • Collaborate with silicon, firmware, and system software teams to enable optimal hardware/software co-design.
  • Improve compute density, power efficiency, and network bandwidth utilization.
  • Drive root cause analysis for hardware issues during validation and production.

Responsibilities

  • responsible for system design, validation, and integration of hardware in the AWS fleet through its entire life cycle
  • work cross functionally with AWS monitoring teams, members of the Hardware Design team, and additional teams across AWS to improve quality and reliability of products operating in the fleet

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service