About The Position

In this role, you’ll work to shape the future of AI/ML hardware acceleration. You will have an opportunity to drive cutting-edge TPU (Tensor Processing Unit) technology that powers Google's most demanding AI/ML applications. You’ll be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google's TPU. You'll contribute to the innovation behind products loved by millions worldwide, and leverage your design and verification expertise to verify complex digital designs, with a specific focus on TPU architecture and its integration within AI/ML-driven systems. As a Quality and Reliability Engineer for Google Cloud, you will lead the development of Design-for-Reliability guidelines and drive the adoption of advanced technologies to optimize silicon production and reliability. You will be responsible for ensuring that High Performance Computing (HPC) SOC products meet stringent quality requirements by collaborating across design, manufacturing, and hardware teams to execute comprehensive test plans. Additionally, you will own the cross-functional investigation and root-cause analysis of integrated circuit (IC) issues to develop effective solutions in a production environment. The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide. We're the driving team behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

Requirements

  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, or equivalent practical experience.
  • 8 years of experience in reliability or product quality engineering (e.g., working on ICs, SoCs, or microprocessors).
  • Experience with silicon or semiconductor manufacturing or Fab processes (e.g., CMOS, FinFET, or Device Physics).
  • Experience with advanced manufacturing nodes (e.g., 5nm, 3nm) or assembly (e.g., 2.5D, 3D, or Chiplet packaging).
  • Experience in a production or manufacturing environment (e.g., Failure Analysis, Root Cause Analysis, or RMA processes).

Nice To Haves

  • Master's degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture.
  • Experience in Chiplets and High power devices.
  • Experience in data analytics to identify commonalities and abnormalities.
  • Experience in semiconductor reliability and manufacturing processes (fab, assembly, test), or IC and packaging failure mechanisms and related failure analysis.
  • Knowledge of Design-for-Reliability guidelines and implementation techniques.
  • Familiarity with test methods and hardware for silicon qualification (e.g., HTOL chambers, ESD, LU).

Responsibilities

  • Own development of Design-for-Reliability guidelines, collaborating with subject area experts (e.g., SER, EMIR, PERC, HVDRC, Margining, etc.).
  • Facilitate technology adoption to optimize production and reliability (embedded sensors, in-field monitor/debug, etc.).
  • Collaborate with design, manufacturing, silicon engineering, and hardware/component quality teams to ensure High Performance Computing (HPC) SOC silicon products meet quality and reliability requirements (Mission profile, DPPM/FIT, Aging, etc.).
  • Partner with cross-functional organizations to design and execute quality and reliability test plans (HTOL, ELFR, ESD/LU, b/HAST, THB, etc.) and production Reliability methods (HVS and other methods).
  • Own cross-functional investigation of IC quality and reliability issues to identify root causes and develop solutions (RMA Triage, Analytics, Failure Analysis, etc.).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service