Principal ML/AI Solutions Engineer

Advanced Micro Devices, Inc•Santa Clara, CA

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: AMD’s Software and Solutions Team is seeking a Principal ML/AI Solutions Engineer to empower customers and partners in adopting AMD’s AI software stack. This role requires strong technical depth in machine learning frameworks, GPU workflows, and enterprise architecture, paired with excellent communication and customer-facing skills.The role spans development, integration, deployment, and performance analysis, ensuring innovative, efficient, and scalable AI solutions on AMD GPUs. THE PERSON: A highly motivated and passionate professional with deep expertise in ML/AI systems and a proven track record in collaboration, problem-solving, and technical execution. The ideal candidate thrives in fast-paced, cross-functional, highly technical environments working closely with customers, ISVs, cloud partners, and engineering teams. This individual excels at engaging directly with enterprise stakeholders to understand complex workloads, lead technical deep dives, deliver reference architectures, and guide solution integrations.

Requirements

Extensive expertise in ML/AI engineering, distributed training, and GPU computing, along with strong experience in customer engagement, enterprise solution design, and technical advocacy.
Knowledge of modern ML workloads (LLMs, generative AI, RAG, CV, fine-tuning) and hands-on experience with PyTorch, TensorFlow, JAX, inference runtimes, and containerized GPU environments is essential.

Nice To Haves

Familiarity with ROCm, CUDA, AMD MI300/MI325 platforms, MLOps systems, data pipelines, and cloud-native deployment architectures is highly desirable.
Additional experience developing demos, validated designs, reproducible notebooks, or technical content is strongly preferred.
A strong understanding of industry trends and competitive AI technologies is necessary to guide scalable solution development and optimize workload performance.

Responsibilities

Work across reference design creation, workflow optimization, performance tuning, and ecosystem enablement
Collaborate with engineering teams, ISVs, cloud partners, and enterprise customers will be essential to guide real-world AI deployments and translate customer insights into actionable engineering direction.
Drive innovation by prototyping and demonstrating end-to-end AI workflows across training, fine-tuning, and inference; developing validated designs, notebooks, and performance playbooks; and providing structured feedback to AMD’s ROCm, frameworks, platform, and performance engineering teams.
Champion AMD technologies within the AI community, build demos and best-practice content, influence product roadmaps, validate ecosystem integrations such as vLLM and Triton, and support partner technology readiness—ultimately advancing AMD’s AI ecosystem and improving workload efficiency at scale.