Systems Design Engineer

Advanced Micro Devices, IncAustin, TX
4dOnsite

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The Data Center Platform Engineering Group (DPEG) is responsible for deploying, validating, and sustaining AMD’s most advanced CPU and GPU platforms inside large-scale data center environments. This team operates at the intersection of silicon, world infrastructure, ensuring platforms perform reliably firmware, systems, and real-under demanding workloads. As a Senior Systems Design Engineer, you will be a hands-on technical owner for platform deployment and availability in a live data center setting. You’ll work directly with cutting-edge GPU/CPU systems, diagnosing complex hardware, firmware, power, thermal, networking, and clustering issues that occur at scale. This role offers deep exposure to how next-generation compute platforms behave in production, close collaboration with cross-functional engineering teams and external partners, and the opportunity to drive continuous improvements that directly impact system reliability and uptime. This is an on-site, fast-paced environment where you will learn by solving real problems, influencing platform readiness, and working with highly experienced engineers across the hardware and software stack. THE PERSON: The ideal candidate is a self-driven systems thinker who enjoys owning complex technical problems from initial failure through root cause and resolution. You are naturally curious, methodical in your debugging approach, and comfortable navigating ambiguity in large, interconnected systems. You communicate clearly with both technical and non-technical stakeholders, collaborate well across disciplines, and are confident working with vendors and partners. You take pride in driving issues to closure, improving processes along the way, and mentoring or guiding others when needed.

Requirements

  • Systems engineering experience supporting complex CPU, GPU, or SoC-based platforms
  • Platform or system-level validation, bring-up, or design reliability experience
  • Debugging expertise across BIOS, BMC, Linux, and hardware interfaces
  • Familiarity with test automation and failure-analysis methodologies
  • Experience working with OEM, ODM, or hardware vendors
  • Knowledge of high-speed interconnects such as PCIe Gen5
  • Hands-on experience using hardware lab equipment (e.g., scopes, programmers, system bring-up tools)

Nice To Haves

  • Exposure to FPGA, CPLD, or firmware development environments

Responsibilities

  • Manage platform deployment, availability, and stability for data center CPU/GPU systems
  • Lead system-level debugging efforts involving hardware, firmware, Linux, networking, power, and thermal behavior
  • Develop structured debug strategies, validation flows, and failure-analysis methodologies
  • Track daily activities, prioritize issues, assign work, and monitor progress to resolution
  • Collaborate with silicon, firmware, validation, networking, and operations teams to assess risks and requirements
  • Execute hands-on laboratory validation to ensure systems operate as intended prior to and during deployment
  • Partner with OEMs, ODMs, and vendors to resolve issues and improve platform reliability
  • Drive continuous improvement in tools, processes, and platform readiness

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service