WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE TEAM The ROCm software organization at AMD builds and maintains the open-source GPU software stack powering AI training, inference, and HPC workloads across AMD's data center and consumer GPU portfolio. ROCm is the foundation on which developers, researchers, and enterprises run their most demanding AI and HPC workloads. Quality and reliability are existential to our success. We operate at the intersection of cutting-edge hardware and software — and we move fast. Our team is deeply invested in open-source, community-driven development, and engineering excellence at every layer of the stack. THE ROLE We're looking for a hands-on Director of Test Engineering to lead and transform the quality function for ROCm. This is not a program management role — it's a deeply technical leadership position for someone who understands the hardware/software interface of GPUs, has built test engineering organizations from the ground up, and is ready to lead the next wave of AI-native, agentic quality engineering. You will own the vision, strategy, and execution of test engineering for ROCm — from kernel-level driver validation to user-space ML framework testing. Critically, you will be the driving force behind scaling your team's impact through AI and agentic tooling, building a modern, autonomous quality organization that moves faster than any traditional QA team could. THE IMPACT YOU WILL HAVE Define and own the test engineering strategy for ROCm across the full HW/SW stack, from driver interfaces to ML framework validation. Transform the quality organization into an AI-first, agentic team — scaling coverage, speed, and reliability without proportional headcount growth. Build and operate continuous testing and validation infrastructure including long-running soak, stress, failure/recovery, and staging environments for product reliability. Raise the bar on test engineering discipline: shift-left practices, SDET-caliber test development, and deep ownership of quality metrics. Partner directly with hardware, firmware, and software engineers to ensure quality is embedded at every stage of development. Drive adoption of AI-assisted testing workflows, intelligent test selection, automated root cause analysis, and agentic CI/CD pipelines across the organization. THE PERSON The ideal candidate is a technical leader who has built and scaled test engineering teams in complex, hardware-adjacent software environments. You are hands-on when it matters — able to prototype a test framework, debug a GPU driver failure, or design a validation architecture. You also understand how customers actually use the product: the AI inference and training workloads they run, the parallelism strategies they deploy, the performance they expect, and the failure modes they hit. That customer-workload knowledge is what separates a QA team that writes blackbox sanity checks from one that designs tests targeting the exact code paths real users exercise. You see AI agents not as a novelty but as the primary lever for scaling your team's output. You are impatient with manual, reactive QA and energized by building systems that catch bugs before humans even see them.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Director
Education Level
No Education Listed