Technical Program Manager – ROCm Libraries

Advanced Micro Devices, Inc•Austin, TX

7d•Hybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. Do you want to drive end-to-end delivery of artificial intelligence, math, computer vision, and communication libraries to enable high performance computing and artificial intelligence? AMD is searching for a talented and motivated Program Manager to join the GPU libraries team developing Math Libraries as part of the AMD ROCm™ Open Software Platform. You are accustomed to working in a dynamic, geographically distributed agile team, where partnership and collaboration are paramount. You possess excellent written and verbal communication skills, strong organization, and attention to detail. With a keen interest in data, you will draw on your strong technical background, analytical capabilities, and interpersonal skills to drive process improvements, improve program operations, and engage with key partners at all levels, from driving technical discussions with developers through to communication with executives. As a TPM in this group, you will not just track schedules; you will be an active technical partner. You will facilitate architectural decision-making, assist with the triage and debugging of complex issues, and translate intricate technical requirements for diverse stakeholders ranging from kernel engineers to executive leadership. You will operate at the intersection of High-Performance Computing (HPC) and Artificial Intelligence (AI), ensuring our math libraries deliver world-class performance on AMD Instinct™ and AMD Radeon™ accelerators. AMD is seeking a Technical Program Manager to join the ROCm Libraries organization and lead execution across the MIOpen, Composable Kernel (CK), and hipDNN software stack. In this role, you will be responsible for end‑to‑end program execution across a complex, performance‑critical set of GPU software libraries that support training and inference workloads on AMD Instinct™ GPUs. You will work closely with engineering and product leadership to drive predictable delivery, execution rigor, and continuous improvement of software development practices across multiple teams. This role operates at the intersection of kernel performance, library architecture, build systems, and customer‑driven requirements, requiring strong technical judgment, structured execution, and clear communication across deeply technical stakeholders. You have experience operating in technically complex software environments where performance, correctness, and platform compatibility are critical. You are comfortable navigating ambiguity and are effective at introducing structure, execution discipline, and data‑driven decision‑making. You are able to translate complex engineering trade‑offs into clear plans, risks, metrics, and delivery commitments, and you communicate effectively with both engineering teams and senior stakeholders.

Requirements

Program Management: 5+ years of experience in Technical Program Management, Engineering Management, or as a Senior Software Engineer with leadership responsibilities.
Compiler & Architecture Knowledge: Experience working with or managing projects involving compiler technologies (e.g., LLVM, GCC, Open64).
Understanding of Code Generation techniques, JIT compilation, or Intermediate Representations (IR) (e.g., MLIR, LLVM IR) and how they impact library performance.
GPU & HPC Domain: Deep understanding of GPU computing (AMD ROCm, CUDA, or OpenCL).
Familiarity with High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads.
Knowledge of linear algebra concepts (GEMM, Sparse Matrices, Tensor operations).
Technical Literacy: Ability to read and understand technical documentation, bug reports, and basic code structures (C++, Python, Assembly/ISA familiarity is a plus).
Process & Tools: Proficiency with Agile/Scrum methodologies and tools like Jira, Confluence, and GitHub.
Library Development: Hands-on experience developing or managing math libraries (BLAS, FFT, LAPACK) or similar performance-critical software.
Bachelor’s or Master’s degree in Computer Science, Software Engineering, Electrical Engineering, Mathematics, or equivalent strongly preferred
Experience managing complex, interdependent software programs in GPU software, deep learning infrastructure, or high‑performance computing.
Prior experience as a software developer, systems engineer, or technical program manager working close to performance‑critical or low‑level software.
Demonstrated success improving software development lifecycle maturity, execution discipline, and delivery predictability.
Familiarity with GPU software stacks, kernel libraries, and related tooling (e.g., HIP, math libraries, build systems).
Strong analytical, reporting, and executive communication skills.
Experience applying different execution models based on product and engineering needs.
Proficiency with Jira, Confluence, and common program management tools.
Bachelor’s or Master’s degree in Computer Science, Software Engineering, Electrical Engineering, Mathematics, or a related technical field.

Nice To Haves

Certifications such as the PMP or agile certification would be an asset
Formal project or program management education or certifications (e.g., PMP, Agile, Scrum) are a plus.

Responsibilities

Program Execution & Delivery: Drive the end-to-end lifecycle of the ROCm BLAS stack, including release planning, roadmap definition, and execution for rocBLAS, hipBLASLt, and hipSPARSELt.
Compiler & Code Gen: Facilitate discussions on code generation strategies (e.g., auto-tuning kernels) and optimizations at the Intermediate Representation (IR) level to maximize hardware utilization.
Technical Triage & Debugging: Actively participate in the triage process for software defects. Leverage your technical background to help engineers prioritize bugs, understand root causes (whether in the library or the compiler backend), and unblock critical paths.
Architectural Decision Support: Facilitate technical discussions between software architects, research teams, and silicon engineers. Help drive consensus on API designs and optimization strategies for new hardware generations.
Stakeholder Management: Translate complex technical requirements and status updates into clear, actionable communications. Bridge the gap between technical engineering teams and management.
Lead end-to-end delivery of MIOpen, hipDNN, and Composable Kernel across multiple GPU architectures.
Coordinate execution across performance‑critical kernels, APIs, backend integrations, and build systems.
Translate product and customer requirements into clear plans, schedules, and deliverables.
Identify and mitigate risks related to performance, compatibility, build complexity, and cross‑team dependencies.
Provide clear, regular status reporting on progress, risks, and execution health.
Define and evolve execution models (Agile, hybrid, milestone‑driven, performance‑driven) appropriate to team and product needs.
Establish best practices for planning, estimation, prioritization, dependency management, and delivery accountability.
Assess execution maturity across teams and drive pragmatic, adoptable improvements in partnership with engineering leadership.
Define and track meaningful execution metrics (e.g., predictability, cycle time, throughput, planning accuracy).
Use data and retrospectives to drive continuous improvement and measurable outcomes.
Build durable execution mechanisms, including reviews, retrospectives with follow‑through, and shared visibility dashboards.