Senior Engineer - GPU Performance Optimization

Advanced Micro Devices, Inc

41d•Hybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE As Senior Software Engineer you will serve as part of our performance engineering team for AMD’s core deep‑learning libraries—hipDNN, MIOpen, and Composable Kernel (CK)—with a primary focus on new GPU products, and optimization on leading and bleeding edge hardware. You will drive performance optimization across these libraries, ensuring strong out‑of‑the‑box performance and a clear path to parity and leadership for compute workloads. This role operates across multiple ASIC generations, requiring strong cross‑architecture software engineering skills and the ability to adapt, analyze, and modify kernels, heuristics, and execution strategies as hardware evolves. You will work at the intersection of kernel development, library integration, and framework enablement, contributing directly to the success of new ROCm releases and new product introductions. THE PERSON You are a performance‑driven engineer who thrives during new hardware bring‑up and ambiguous early‑silicon phases. You are comfortable working across abstraction layers—from kernel code and library APIs to framework‑visible performance—and you enjoy translating architectural characteristics into concrete software optimizations. You collaborate effectively across teams, communicate performance trade‑offs clearly, and are trusted to take ownership of critical performance paths during time‑sensitive product ramps. You grow influence through technical execution, deep expertise, and mentoring.

Nice To Haves

Strong hands‑on experience with GPU performance engineering and kernel optimization. Practical experience with deep‑learning libraries.
Experience supporting new hardware bring‑up or optimizing software across multiple GPU architectures. Solid understanding of deep‑learning operator patterns (GEMM, convolution, attention, normalization, fusion).
Proficiency in C/C++, with Python used for tooling, benchmarking, and analysis.
Experience using GPU profiling, tracing, and performance analysis tools. Familiarity with framework‑level integration and validation (e.g., PyTorch or JAX).
Applied experience using AI‑assisted coding tools in professional software engineering workflows, including code generation, refactoring, test creation, documentation, and design exploration.

Responsibilities

Performance Engineering for New Products: Lead performance optimization efforts for hipDNN, MIOpen, and CK on new AMD GPU architectures. Drive early performance characterization, gap analysis, and optimization plans during pre‑silicon and post‑silicon bring‑up.
Kernel & Library Optimization: Implement and optimize performance‑critical kernels and operators used by hipDNN and MIOpen, leveraging other libraries where appropriate. Improve kernel selection, fusion strategies, and heuristics to maximize efficiency across diverse workloads.
Cross‑ASIC Software Engineering: Adapt library implementations and tuning strategies across multiple ASICs, balancing portability with architecture‑specific optimization. Identify when shared abstractions are sufficient versus when targeted specialization is required.
hipDNN Enablement & Transition: Contribute to hipDNN’s role as the primary execution and fusion layer, including plugin integration and performance validation. Support the transition of functionality from MIOpen into hipDNN while maintaining performance and compatibility.
Framework‑Facing Performance: Work closely with framework teams (e.g., PyTorch, JAX, Triton) to ensure optimized library paths are exercised in real workloads. Validate performance improvements using representative training and inference models.
Performance Validation & Regression Control: Define and execute performance benchmarks for new products. Help detect, diagnose, and resolve performance regressions across releases and architectures.
Collaboration & Mentorship: Partner with Principal engineers, architecture teams, and kernel specialists to align optimization efforts. Share best practices in kernel tuning, performance analysis, and cross‑ASIC optimization with the broader organization.
Leverages AI‑assisted software development tools to accelerate design, implementation, review, and documentation of complex software libraries. Establishes best practices for responsible use of AI assistance, including validation, review, and traceability of generated code and technical artifacts.