Lead AI Cluster Models Architect

Advanced Micro Devices, IncAustin, TX
4dHybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. We are looking for a dynamic, energetic Lead AI Cluster Models Architect to join our growing team. As a key contributor to the success of AMD’s product, you will be part of a leading team to drive and improve AMD’s abilities to deliver the highest quality, industry-leading technologies to market. AMD's Systems Design Engineering team fosters and encourages continuous technical innovation to showcase successes as well as facilitate continuous career development. The AI Cluster Models Architect plays a critical role in shaping the future of AI/ML training and inferencing systems as the AI Industry transitions into the Inference space (while still broadening within the AI Training market space). This individual will collaborate with a broad range of internal and external partners, including System management, OS, NOS, Compute Libraries, and Software Tools teams, to integrate state-of-the-art technology solutions that pave the way for AMD AI adoption within both inferencing and training.

Nice To Haves

  • In-depth knowledge and experience with AI clusters and topologies
  • Extensive real world experience designing hyperscale computing clusters
  • Strong analytical/problem-solving skills and pronounced attention to details
  • Must be a self-starter, and able to independently drive tasks to completion

Responsibilities

  • Designing state of the art model architectures, data, and parameter sets, for large AI/ML training and inferencing systems which can be optimized for hyperscale capabilities
  • Engage with AMD customer base while aligning system and model architectures
  • Pioneering system and container networking strategies to facilitate seamless operation and scaling of AI clusters
  • Developing scalable AI/ML training and inferencing communication network reference architectures for each generation of AMD AI/ML products
  • Participate in design phase of each AMD AI/ML GPU generation by developing cluster computational architectures and requirements
  • Collaborate across AMD internal and external partner teams to improve performance for AMD AI/ML clusters

Benefits

  • AMD benefits at a glance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service