Principal AI Systems Design Engineer

Advanced Micro Devices, IncSanta Clara, CA
Hybrid

About The Position

The AI Customer Engineering organization is looking for a Principal AI Systems Design Engineer to help customers ramp successfully with AMD GPU platforms. This is a hands‑on, customer‑facing role leading full-stack debug of AI infrastructure focusing on high-speed memory standards such as DDR and HBM.

Requirements

  • Significant hands-on experience with high speed memories such as DDR5/LP5/HBM3 including silicon bring up and debug.
  • Good understanding of Memory controllers and PHYs, Training algorithms and FW interactions, ECC, Manufacturing, and reliability mechanisms with hands-on system development experience.
  • Experience in debugging of complex full stack SW/FW/HW issues is a must.
  • Understand memory bottlenecks through the system and be able to validate the items connecting to the GPU SOC (HBM, VRs, internal networking).
  • Communication is essential in working with different owners of the functional code stack as well as the ability to drive issues via phone calls, chat messages, e-mails.
  • Bachelor’s/Master’s degree in Computer Science or related fields

Nice To Haves

  • Significant experience in SoC architecture, memory standards, debug of complex system level issues
  • Strong debug capabilities of memory protocols in Server CPU/GPU/FPGA in single and multi-node platforms
  • Hands-on troubleshooting experience in solving technical issues; own the problem and drive for resolution
  • Hands-on experience in using industry debug tools, scopes as well examine board level signal, power integrity
  • Good balance of understanding with hardware, architecture, and software expertise
  • Solid programming skills in Python, C, or C++
  • Skilled in scripting languages such as Perl, Ruby, and Shell script
  • Experience running, analyzing, and system benchmarks such as JEDEC standards
  • Proficient with revision control (GIT, SVN and CVS)
  • Proven ability to drive resolution of critical problems within a lab, Datacenter
  • Relationship with external customers/partners and able to help resolve problems on customer platforms
  • Hands on experience with Hardware in silicon/system lab environment is preferred.

Responsibilities

  • Drive resolutions for customer issues with innovative debug methods with a goal to root cause and enable customers in a fast pace environment
  • Provide technical leadership on issue debug closely working with SoC, Memory Technology, Design, Validation and Manufacturing teams driving to root cause
  • Ability to setup hardware systems and probe components in the system; check electrical, power signals, and validate a system using different AI workloads
  • Communicate / Document flows and methods of bring-up, system initialization, running stress workloads and debug
  • Lead technical presentations demonstrating a good understanding of customer application, infrastructure, and system design
  • Be a leader and mentor to the operation team; be hands-on and lead by example

Benefits

  • AMD benefits at a glance.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service