Lead GFX Architect, RAS (Reliability, Availability, and Serviceability)

Advanced Micro Devices, IncAustin, TX
1dHybrid

About The Position

As an AMD Lead GFX Architect (Reliability, Availability, and Serviceability), you'll be placed at the intersection of GPU and MI Accelerator features and RAS requirements. You will work alongside GFXIP, AI, and RAS experts and industry pioneers to innovate new RAS features and enhance/evaluate next generation architectures for data center, automotive, and emerging market needs. You will be responsible for the RAS features of the next-gen GFXIP deployments, across data center GPUs and NPUs, desktop and client GPUs and NPUs, and automotive GFXIP deployments. You will develop novel technology and architectural solutions, write specifications, analyze reliability measures (FIT, DPPM, MTBF and MTBR), tradeoffs against PPA (performance, power and area) and security, and work with SOC RAS architects and business units to ensure that AMD's parts meet the increasingly demanding RAS requirements to achieve and maintain best-in-class status. This role requires technical breadth, problem solving and soft skills, because the RAS architect is a cross-functional technical leader.

Requirements

  • Excellent communication and presentation skills, demonstrated through technical publications, presentations, trainings, etc.
  • Familiar with and have extensive experience designing and/or architecting features of VLSI circuits.
  • Experience designing and architecting RAS features and are up-to-date in your understanding of industry needs and RAS feature trends.
  • Undergrad degree required. Bachelors, Masters or PhD degree in Computer Engineering/Electrical Engineering.
  • Mastery of principles of logic and circuit design and hardware principles and methods relevant to more complex systems and platforms is preferred.

Nice To Haves

  • Experience in computer architecture and reliability, with a strong understanding of the physical interference and breakdown mechanisms that drive the need for RAS protections.
  • Familiarity with RAS protection features such as storage element parity and ECC, bus parity, heartbeat monitors, deferred error handling, watchdog timers, and the impact of resulting resets.
  • Computing and graphics architecture.
  • RTL design and/or verification.
  • Failure rates and figures of merit.
  • Background in reliability analysis.

Responsibilities

  • Lead the definition of GFXIP RAS features and capabilities, establish architecture requirements, and write architectural specifications for RAS features.
  • Work with company RAS and industry RAS experts to identify trends and likely future needs.
  • Research new reliability features for next generation GFX IP to avoid pitfalls and improve reliability per clock and per cost.
  • Innovate design and architecture procedures to increase efficiency of design inclusion or exclusion of RAS features and levels of protection.
  • Negotiate with stakeholders on RAS objectives.
  • Guide HW design teams in evaluating and improving the RAS protection and coverage of their implementations.
  • Manage, hire and train small technical team focused on RAS objectives.

Benefits

  • AMD benefits at a glance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service