Principal SoC Performance Architect-Microbenchmarks

Advanced Micro Devices, IncAustin, TX

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. AMD is looking for an outstanding technical contributor to drive performance analysis, characterization, and optimization of next-generation Data Center GPU (DCGPU) platforms. This role focuses on extracting maximum performance across the full system stack—including hardware, firmware, drivers, runtime, libraries, and workloads—through deep architectural understanding and data-driven methodologies. The engineer will develop and maintain microbenchmarks and system-level workloads spanning pre-silicon and post-silicon environments to enable performance validation, debug, and optimization.

Requirements

  • Proven experience working on highly parallel compute systems or SoCs (GPUs preferred)
  • Experience developing and maintaining microbenchmarks tied to architectural features
  • Strong exposure to performance analysis across pre-silicon and post-silicon environments
  • Solid understanding of GPU compute, memory systems, and interconnect architectures
  • Experience with profiling, tracing, and performance counter analysis
  • Ability to debug complex system-level performance issues across multiple layers
  • MS/PhD in Computer Engineering, Computer Science, or related field
  • Excellent communication skills and ability to present complex performance insights clearly

Nice To Haves

  • 10–15+ years of experience in performance engineering for GPUs, HPC systems, or highly parallel SoCs
  • Strong understanding of GPU architecture, parallel computing, and memory hierarchies
  • Experience with microbenchmark development and system-level workload analysis
  • Hands-on experience with performance profiling tools (rocprof, Nsight, perf, etc.)
  • Experience analyzing AI/HPC workloads (LLMs, training, inference, communication libraries like RCCL/NCCL)
  • Strong background in hardware/software co-design and performance optimization
  • Familiarity with pre-silicon (simulation/emulation/models) and post-silicon performance workflows
  • Programming expertise in C/C++, Python; experience with GPU programming models (HIP, CUDA, OpenCL)
  • Strong analytical and debugging skills with a data-driven mindset
  • Experience working across full software stack (compiler → runtime → kernels → system)
  • Exposure to performance modeling, scaling analysis, or competitive benchmarking is a plus
  • Bachelor’s or Master’s degree in related discipline preferred

Responsibilities

  • Analyze and optimize performance of DCGPU systems across AI training, inference, and HPC workloads
  • Identify bottlenecks across hardware, firmware, drivers, runtime, libraries, and applications
  • Perform deep kernel-level and system-level profiling to understand performance behavior
  • Provide actionable insights to architecture, software, and design teams to improve performance
  • Design and develop targeted microbenchmarks to characterize GPU subsystems (compute, memory, interconnect, collectives)
  • Build representative system-level workloads reflecting real-world AI/HPC use cases
  • Ensure microbenchmarks correlate to application-level performance and architectural intent
  • Maintain and evolve benchmark suites across multiple GPU generations
  • Enable performance validation in pre-silicon environments (simulation/emulation/models)
  • Correlate performance data across pre-silicon models and post-silicon measurements
  • Develop methodologies to reuse workloads and microbenchmarks across the full lifecycle
  • Support bring-up and early silicon performance characterization
  • Work across the entire software stack: compiler, runtime, libraries, drivers, and firmware
  • Collaborate with ROCm / AI frameworks / kernel teams to improve performance
  • Analyze interactions between workload characteristics and hardware execution
  • Optimize key kernels (e.g., GEMMs, collectives, attention) and system-level behavior
  • Develop and enhance performance measurement, profiling, and analysis tools
  • Enable scalable, repeatable workflows for benchmarking and analysis
  • Build automation for performance regression tracking and reporting
  • Contribute to unified infrastructure spanning pre-silicon and post-silicon environments
  • Partner with SoC architecture, GPU IP, software, and system teams
  • Influence design decisions using data-driven performance insights
  • Collaborate with competitive analysis teams to understand gaps vs. industry platforms
  • Develop strong intuition and/or models for performance scaling and limits
  • Translate performance data into architectural feedback for future GPU designs
  • Support competitive benchmarking and performance projections

Benefits

  • AMD benefits at a glance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service