System Validation Technical Program Manager - Server

Advanced Micro Devices, IncAustin, TX
Onsite

About The Position

AMD's Data Center Platform Engineering Group (DPEG) is responsible for designing, building, and delivering innovative technology infrastructure, including cloud-enabling server solutions for leading cloud and telecom providers. As a Technical Program Manager within the DC Platform Engineering organization, this role involves leading the management and execution of AI and hyperscale server platform programs throughout the entire product lifecycle, from definition and planning through development, production, release, and end of life. The Validation Technical Program Manager specifically serves as the key technical leader for end-to-end rack-level server solution validation. This includes defining, developing, and executing comprehensive validation test strategies to ensure product functionality and compliance with specifications, covering firmware and software stacks. The scope of validation spans board, system, and rack-level integration, encompassing electrical, power, mechanical, thermal, shock and vibration, compliance, system integration, platform validation, reliability, and firmware domains. The individual will lead triage and debug efforts, identify root causes, and drive timely issue resolution, collaborating closely with architecture, mechanical design, platform, firmware, and system engineering teams across all design phases. Strong judgment is required for defining test specifications, requirements, strategies, and methodologies for electrical, firmware, and platform-level validation. The role also involves managing test planning and execution, schedules, tracking requirements, summarizing results, developing validation documentation, and participating in internal and customer-facing program core team meetings to communicate status and ensure milestone readiness. Additionally, the position supports Internal Server R&D hardware validation initiatives and contributes to system validation automation efforts.

Requirements

  • Experience managing technical programs or workstreams across multiple cross functional teams with competing priorities.
  • Demonstrated ability to plan, track, and execute programs against defined requirements, schedules, and milestones.
  • Proficiency in program management fundamentals, including scope definition, schedule management, risk tracking, dependency management, and status reporting.
  • Experience driving cross functional alignment, facilitating reviews, and coordinating execution across engineering, validation, and partner teams.
  • Ability to identify, communicate, and escalate risks and issues with clear mitigation plans.
  • Strong written and verbal communication skills to present program status, risks, and readiness to internal and customer facing stakeholders.
  • Experience using program tracking tools (e.g., Jira, Azure DevOps, dashboards, spreadsheets) to monitor progress and report execution health.
  • Experience in server and datacenter validation, including board, system, and rack level hardware and software qualification
  • Working knowledge of server platform architectures, including PCIe, memory, storage, networking, compute, management, and system level integration.
  • Experience developing and executing validation test plans, test cases, and tracking progress against requirements and milestones.
  • Ability to support triage and debug efforts, manage defect tracking, and drive issue resolution in collaboration with cross functional engineering teams.
  • Experience in automation and scripting (e.g., Python, PowerShell) to improve validation efficiency and repeatability.
  • Solid understanding of validation methodologies to support reliable platform performance and milestone readiness.
  • Bachelor’s degree in Electrical Engineering, Computer Engineering, System Engineering or Computer Application or Master’s degree in Electrical Engineering, Computer Engineering, System Engineering or Computer Application

Responsibilities

  • Own server platform validation execution across board, system, and rack levels, ensuring alignment to program requirements and milestones.
  • Define and drive comprehensive validation test plans covering electrical, power, mechanical, thermal, shock & vibration, compliance, reliability, firmware, and system integration domains.
  • Partner closely with architecture, mechanical, platform, firmware, and system engineering teams to coordinate validation activities throughout the design lifecycle.
  • Apply sound technical judgment in developing test specifications, requirements, and methodologies for electrical, firmware, and platform level validation.
  • Manage validation schedules, requirement tracking, and execution progress, ensuring timely completion and clear communication of results.
  • Lead triage and debug activities, driving root cause identification and issue resolution while escalating risks as needed.
  • Prepare and maintain validation documentation, including test summaries, status reports, and milestone readiness inputs.
  • Represent validation in internal and customer facing program meetings, providing accurate status, risk awareness, and milestone readiness updates.
  • Support Internal Server R&D hardware validation efforts as required.
  • Contribute to validation automation efforts to improve test coverage, efficiency, and repeatability.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service