GPU Exerciser Software Engineer

Advanced Micro Devices, IncAustin, TX
6dHybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: Our growing team plays a major role in architecting and shaping data center GPUs offered by AMD. As part of the Exerciser team, you will architect and develop low-level GPU testcases (“exercisers”) to expose silicon and software bugs. This will require strong collaboration with stakeholders from design, emulation, driver, firmware, and debug teams. Candidates with a background or strong skills in both GPU and systems programming (Linux kernel and userspace) will excel in this role. Out-of-the-box thinking and leveraging of novel approaches to address tough technical challenges is encouraged. Join our growing team to accelerate the introduction of cutting-edge compute products into the datacenter market.

Requirements

  • We are seeking a highly analytical and detail-oriented individual with proven problem-solving skills.
  • The successful candidate will possess strong technical expertise, specifically in GPU programming and Linux systems programming.
  • You should be comfortable working both collaboratively as part of a team and independently, demonstrating the ability to manage your workload efficiently with minimal supervision.

Nice To Haves

  • GPU programming (CUDA / HIP / OpenCL / SP3 assembly)
  • GPU architecture expertise
  • Microprocessor validation/ verification
  • Experience with parallel programming, concurrency, and memory consistency models
  • Modern C++ programming
  • Low-level/firmware programming
  • Linux device drivers and/or kernel development
  • Linux userspace systems programming

Responsibilities

  • Design and implement innovative exercisers and stress applications that leverage deep understanding of GPU microarchitecture to uncover subtle hardware and software issues.
  • Drive complex debug efforts from failure observation through root-cause analysis, working closely with hardware, firmware, and software teams.
  • Build and enhance in-house stress and validation frameworks, improving their scalability, coverage, and ease of use for the broader engineering organization.

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service