Senior Fellow, ML Optimization

Advanced Micro Devices, Inc•San Jose, CA

31d

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. The ROLE Senior Fellow role is a company-level technical leadership position, accountable for defining and driving AMD’s strategy, architecture, optimization and tooling to achieve industry-leading ML Workload Performance on AMD GPU. You will partner across hardware architecture, AI frameworks, compilers, runtime, ROCm, developer tools and model to scale performance analysis and optimization. As Senior Fellow of ML workload performance, you will drive the end-to-end technical performance attainment across the entire software stack focusing on getting the best performance on multiple generations of AMD GPUs with wide range of models including latest state-of-the-art AI models. You will set strategy and roadmap for general optimization, accelerating supporting new models and out of box performance. If you are passion about performance optimization, getting the best out of the HW, and shaping the future of AI acceleration, then this role is for you. The PERSON The ideal candidate will have deep knowledge with ML hardware architecture, software optimization, performance modeling, AI frameworks and latest trend in inference and training optimization. Hand-on experience in mapping model architecture to low level software, hardware and understanding the impact of each layer of the stack on model performance. Strong knowledge in latest generative model architecture, especially SoTA models, distributed inference and deployment at scale is crucial.

Requirements

Multiple years of technical experience in performance optimization.
Strong technical expertise and experience in performance analysis, projection, and hardware architecture.
Deep knowledge and hand-on experience of AI Frameworks such as PyTorch, JAX, vLLM, and SGLang.
Strong technical leadership skills, ability to work collaboratively with cross-functional teams.
Mentor, coach, and inspire a diverse and talented team of researchers and engineers.
Excellent written, verbal, and presentation skills, ability to coordinate internally and externally.
A PhD or Master plus equivalent experience in computer science, electrical engineer, or a related field.

Responsibilities

Set strategy and roadmap for AMD model optimization.
Performance tuning, profiling and analysis of large-scale models for LLM, diffusion, multimodal, RecSys and generative AI, single node and distributed. In addition to exploring various tradeoffs and design decisions.
Participate in hardware-software co-design for future hardware optimization on various ML workloads.
Develop and improve framework, tools and infrastructure for performance estimation, modeling and reporting.
Communicate and present the results of the performance analysis and modeling to stakeholders, and senior leadership. And provide a concrete recommendation.
Cross team collaboration and working across the organization to identify opportunities and develop strategies.