Senior AI Switch Systems Design Engineer

Advanced Micro Devices, IncSecaucus, NJ
Onsite

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. MTS SYSTEMS ENGINEER THE ROLE: We are looking for a hands-on, technically sharp system design engineer to join our growing team and lead the bring-up of cutting-edge scale-up switches at the heart of next-generation AI rack infrastructure. As a key contributor, you will bring deep expertise in high-speed Ethernet, server management, and platform validation to drive switch platforms from initial power-on through full system qualification. In this role, you will take full ownership of bring-up execution, apply your debugging skills to solve complex multi-layer problems, and collaborate closely with hardware, firmware, and software teams to deliver production-ready systems.

Requirements

  • You're a highly motivated team player with a strong development background, problem solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt.
  • Excellent teamwork skills and capable of working independently.
  • Bachelor’s/Master’s degree in Computer Science or related field strongly preferred
  • Extensive hands-on experience in hardware bring-up, platform validation, or high-speed networking silicon characterization
  • Experience with high-speed switch ASICs (Broadcom TH6/Tomahawk series preferred) and familiarity with Broadcom's SDK/DAPI frameworks
  • Deep understanding of high-speed Ethernet standards (400GbE, 800GbE) including AN/LT (IEEE 802.3), RS-FEC / KP4-FEC, and PAM4 SerDes technology
  • Hands-on experience with PRBS testing, BER measurement, eye diagram analysis, and Snake/loopback traffic validation methodologies
  • Familiarity with LinkCAT or equivalent PHY/link characterization tools
  • Experience with server management protocols: IPMI, Redfish/OpenBMC, KCS, IPMB, and PLDM for out-of-band control and telemetry
  • Proficiency in Python for test automation, log parsing, and data analysis
  • Strong debugging skills — comfortable working across hardware (oscilloscope, protocol analyzer), firmware logs, and software traces to isolate root cause
  • Experience reading schematics and PCB layout to correlate signal integrity observations with hardware design
  • Excellent communication skills with the ability to document findings clearly and collaborate across multidisciplinary teams
  • Experience with high-density switch/router platforms or AI/ML fabric infrastructure is a strong plus

Responsibilities

  • Lead the system bring-up and validation of state-of-the-art AI scale-up switches purpose-built for high-density GPU compute racks, from initial power-on through full system validation
  • Perform high-speed SerDes and link bring-up, including configuring and validating Auto-Negotiation/Link Training (AN/LT), tuning TX equalization, and characterizing signal integrity across 200G/400G/800G interfaces
  • Execute comprehensive link qualification testing using PRBS (Pseudo-Random Binary Sequence), Snake Traffic loopback testing, and FEC (Forward Error Correction) analysis to validate BER performance at scale
  • Utilize LinkCAT and Broadcom SDK tools to characterize port performance, diagnose link failures, and validate PHY configurations across large port counts
  • Integrate and validate server management infrastructure including BMC/IPMI, Redfish API, and out-of-band management workflows for automated bring-up and health monitoring
  • Develop and maintain bring-up scripts and test automation (Python) to accelerate validation coverage across chassis configurations
  • Debug complex system-level failures spanning hardware, firmware, and software including signal integrity issues, firmware crashes, and management plane anomalies and drive issues to root cause
  • Collaborate with hardware, firmware, and software teams to reproduce failures, document findings, and verify fixes across platform revisions
  • Maintain detailed bring-up documentation, test reports, and issue tracking throughout the product development lifecycle

Benefits

  • AMD benefits at a glance.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service