Director, AI/ML Forward Deployment Engineering GPU

Advanced Micro Devices, IncSanta Clara, CA
8hHybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE TEAM AMD's Data Center GPU organization is transforming the AI and HPC landscape. Our mission is to design and market exceptional products—anchored by our Instinct™ GPU portfolio—that power the next generation of computing in enterprise data centers, cloud, and supercomputing environments. If you’re excited by AI disruption and want to be part of building something big, join us. The Role: As AMD grows, growing our leadership in systems engineering expertise is essential for all the new products in the AMD pipeline. Do you want to operate at the leading-edge of the AI revolution, working with the largest companies in the world? Want to read the news every day and see where your efforts have made a difference. Make a significant impact in AMD’s Data Center GPU business and the future of AI through this new, high-visibility role! The leader of the Advanced Forward Deployment and System Engineering team for AMD's Data Center GPU (DC GPU) business unit advises our customers, partnerships, engineering development teams, and customer engineering teams to deliver innovative AI software features for AMD products and systems. Drive forward looking, leadership capabilities into our products, software, firmware, ecosystem, and ultimately, some of the largest compute clusters in the world. Develop a cohesive strategy, lead planning activities, hire teams, and deliver innovation in the area of production AI training and inference applications. The Person: An ambitious engineering leader with a proven track record in ensuring the success of AI software solutions delivered to end customers. Demonstrates the ability to blend technical knowledge of software applications and data center operation, and deliver software solutions to customers as part of a cross-company team. Relishes mentorship and development of technical leaders. Brings sophisticated multi-functional alignment and team skills. Leading this effort will require an unusual mix of technical breadth and depth, problem solving, management skills, and soft skills. It’s not for the overly introverted. But you are effective with communication and building respect while presenting status and solving problems with executive level audiences.

Requirements

  • Experience managing a growing team of software support engineers with dependencies and interactions with other teams and stakeholders
  • Depth and expertise in building, running, and tuning AI models on systems from laptops to multi-node clusters
  • Experience with and practical knowledge of how Compute, Network, Storage, architecture come together in building large AI/ML clusters
  • Experience with solving functionality and performance issues with multi-node networking tests and inference/training workloads
  • Equally fluent with software, firmware, and hardware
  • Bachelor’s Degree (BS) in Electrical Engineering, Computer Engineering, Computer Science or other relevant field

Nice To Haves

  • Master’s Degree (MS) preferred

Responsibilities

  • Lead a multi-functional team to deliver and support leading edge AI software for AMD’s customers
  • Partner with broader AMD's Software team to drive definition and alignment of AI model implementations for AMD platforms as part of a larger team of AI Customer Engineering HW/FW validation and debug engineers
  • Hold a key seat at the table as part of a team responsible for successful end-to-end delivery of large systems to top AI industry customers
  • Grow an advanced AI software development team to augment existing engineering teams and program/product management teams - implement, optimize, tune, and test at the server and cluster level
  • Assess the current state of AMD’s HW, FW, SW and ecosystem capabilities and define generational improvements while engaging with existing cross-company efforts to deliver full system solutions
  • Define and enter into strategic ecosystem partnerships for solution components
  • Ensure system assessments and customer platform attainment to deliver industry best products
  • Develop positive relationships with key people across the organization and development partners
  • Work with architects to set design and development plans and objectives (IP, SoC, Power, Firmware, System, Network, Compute, Storage and Workloads)

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service