About The Position

As Senior Member of Technical Staff in the GPU Libraries group, you will provide technical leadership and strategic support across the AMD Radeon Open Ecosystem (ROCm) ecosystem. This role is focused on critically analyzing, reviewing, and improving GPU kernel algorithms within the Composable Kernel (CK) and MIOpen libraries, with a strong emphasis on performance tuning through both explainable heuristics and empirical benchmarking. Working collaboratively with library owners, kernel developers, and cross-functional performance engineering teams, you will drive kernel optimization strategies that translate directly into measurable gains for AI/ML and HPC workloads on AMD Instinct accelerators and Radeon GPUs. This position requires deep understanding and the ability to reason about performance from first principles while also leveraging data-driven analytics to guide tuning decisions.

Requirements

  • Extensive and broad hands-on experience with C++, with relevant applied experience using CUDA, HIP, OpenMP, MPI, or OpenCL for accelerated computing on CPUs and GPUs. Familiarity with other programming languages e.g. Python, Rust. Knowledge or applied experience with popular AI/ML Frameworks (PyTorch, TensorFlow, JAX).
  • Proven experience with kernel performance tuning — both through principled heuristic design and through systematic empirical benchmarking. Ability to articulate why a tuning configuration works, not just that it does. Ability to reason about performance at the hardware level and translate architectural insight into kernel optimization strategies.
  • Familiarity with Composable Kernel (CK), MIOpen, or equivalent GPU kernel libraries (e.g., CUTLASS, cuDNN, NeuronSDK). Understanding of GEMM, convolution, attention, pooling and other core compute primitives used in AI/ML workloads.
  • Applied experience using AI-assisted coding tools in professional software engineering workflows, including code generation, refactoring, test creation, documentation, and design exploration.

Nice To Haves

  • Advanced degrees, such as M.Sc./M.Eng. or Ph.D. are preferred — ideally in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related field with focus on high-performance computing, GPU architecture, or numerical methods.

Responsibilities

  • Algorithm Analysis & Improvement: Critically review and improve kernel algorithms, identifying opportunities for redesign, fusion, and optimization that yield measurable performance gains across AMD GPU architectures.
  • Design explainable heuristic models for kernel selection, tile-size determination, data layout choices, and workload-to-CU mapping — ensuring tuning decisions are interpretable, maintainable, and adaptable.
  • Partner with teams to execute large-scale kernel benchmarking campaigns, building data pipelines and analytics workflows to process, visualize, and extract actionable insights from extensive performance datasets.
  • Partner with teams to perform deep-dive performance investigations using AMD profiling and tracing tools (rocProf, Omniperf, Omnitrace), correlating hardware counter data with kernel behavior to identify bottlenecks in compute, memory bandwidth, LDS utilization, Matrix Core throughput, and instruction issue rates.
  • Initiate, influence, and drive architecture, design, and documentation efforts as they arise across teams. Work closely with principal engineering staff to plan and execute technical governance activities across integrated libraries and engineering teams.
  • Leverage AI-assisted software development tools to accelerate design, implementation, review, and documentation of complex software libraries. Establish best practices for responsible use of AI assistance, including validation, review, and traceability of generated code and technical artifacts.

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service