High-performance AI inference solutions Engineer

Advanced Micro Devices, IncSan Jose, CA
6hHybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. The AMD AI Group (AIG) is seeking an experienced MTS/Senior Software Development Engineer to drive high-performance AI inference solutions on AMD Instinct GPUs. This role combines deep expertise in compiler technology, GPU kernel optimization, and modern deep learning frameworks to deliver production-grade inference performance across AMD’s current and next-generation accelerator lineup — from MI300X and MI350/MI355X shipping today, to future GPU product lineups. You will work at the intersection of model optimization, kernel development, and serving infrastructure to ensure AMD GPUs deliver world-class inference throughput and latency. AMD is looking for a specialized software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

Requirements

  • Experience in high-performance computing, AI inference, GPU kernel development, or hardware-software co-design.
  • Strong proficiency in C++, Python, and CUDA/HIP with hands-on experience writing and optimizing GPU kernels for AI workloads.
  • Deep understanding of compiler infrastructure (LLVM, MLIR) and experience with compiler optimization passes targeting GPU architectures.
  • Experience with deep learning frameworks (PyTorch, TensorFlow) and inference serving systems (vLLM, SGLang, TorchServe, or equivalent).
  • Demonstrated ability to analyze and optimize GPU kernel performance: memory coalescing, occupancy tuning, register pressure management, and instruction-level optimization.
  • Strong mathematical foundations in numerical computing, linear algebra, and optimization algorithms

Nice To Haves

  • Experience with AMD ROCm ecosystem, RCCL, and Instinct MI-series GPU architectures (MI300X, MI350X, MI355X).
  • Track record of publications in top-tier venues (FPGA, ICCAD, DAC, NeurIPS, ICML, or equivalent).
  • Experience building GNN-based models for EDA or compiler optimization problems.
  • Familiarity with quantization techniques (weight-activation quantization, mixed-precision inference, FP4/FP6/FP8) for production deployment.
  • Experience with high-performance systolic array and attention kernel design for GPU accelerators.
  • Contributions to open-source HPC or AI libraries (BLAS, graph analytics, sparse solvers).
  • Experience with distributed inference systems, multi-GPU serving at scale, and rack-level AI infrastructure.
  • Background in algorithm-hardware co-design and performance modeling across multiple GPU generations.

Responsibilities

  • Design, optimize, and benchmark AI inference pipelines for large language models (LLMs), vision-language models (VLMs), and transformer architectures on AMD Instinct GPUs using ROCm, HIP, and MLIR.
  • Lead compiler-level optimizations for inference workloads, including LLVM instruction scheduling, MLIR dialect development, and performance-critical optimization passes for AMD GPU targets.
  • Develop and optimize high-performance GPU kernels (GEMM, attention mechanisms, custom operators) with deep attention to memory hierarchy, VGPR utilization, compute-communication overlap, and warp-level scheduling.
  • Drive integration and performance optimization of AMD Instinct GPUs within inference serving frameworks such as vLLM, SGLang, and TorchServe — ensuring day-zero readiness for new GPU launches.
  • Build forward-looking inference software for next-generation hardware: optimize for HBM4 memory hierarchies, new FP4/FP6 data types, and scale-up interconnects on MI450 and MI500 series GPUs.
  • Architect graph neural network (GNN) based QoR estimation models for compiler design space exploration and automated performance budgeting across GPU generations.
  • Collaborate with silicon architecture teams to provide software-informed feedback on next-generation Instinct GPU designs, ensuring inference workload characteristics are reflected in hardware decisions.
  • Develop quantization-aware training and post-training quantization pipelines to maximize model performance on AMD’s evolving data type support (FP8, FP6, FP4).
  • Contribute to AMD’s core compute libraries (BLAS, HPC, Graph) with a focus on inference-critical primitives and cross-generational performance portability.
  • Benchmark, profile, and resolve performance bottlenecks in distributed inference systems, including tensor parallelism scaling, RCCL communication patterns, and multi-GPU serving on Helios rack-scale infrastructure.

Benefits

  • Competitive compensation
  • Comprehensive benefits
  • Culture that values deep technical contribution and engineering excellence
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service