At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. As a Sr Engineer AI Infrastructure Validation, you will architect, validate, and debug large‑scale AMD GPU systems spanning device, node, chassis, and rack-level deployments. You will define system-level test strategies for multi-GPU, multi-node accelerator platforms, ensuring correctness, performance, scalability, and reliability across hardware and software boundaries. This role is deeply technical and hands-on, involving GPU bring-up, firmware/driver interaction, networking validation (RDMA), and large-scale cluster enablement. You will directly influence product readiness and future AMD GPU platform designs by providing system-level feedback into architecture, silicon features, and validation infrastructure. THE PERSON: You are a system thinker with deep technical instincts, capable of root-causing failures that span GPU silicon, PCIe/Infinity Fabric, networking, drivers, firmware, and orchestration layers. You are comfortable debugging issues that only emerge at scale—during long‑running workloads, high-throughput fabric stress, or multi-node synchronization scenarios. You bring: Proven technical leadership in complex GPU/accelerator environments The ability to translate low-level failures (timeouts, hangs, data corruption) into actionable root causes Strong collaboration skills across Architecture, Design, Firmware, Software, and Validation teams A track record of building robust, repeatable test infrastructure, not one-off debug scripts
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees