SoC RAS Architect

ARMSan Jose, CA
99d$253,300 - $342,700Hybrid

About The Position

Arm is seeking an experienced SoC Availability, Reliability, and Serviceability (RAS) Architect to drive the RAS strategy for our next-generation SoCs. In this pivotal role, you will collaborate closely with design, verification, manufacturing, and product engineering teams to develop robust solutions that meet stringent reliability, aging, and lifecycle expectations across a diverse set of workloads and deployment environments. You will be responsible for defining architectural specifications and delivering end-to-end RAS solutions for SoCs targeting data center applications. This includes setting RAS goals, developing strategy and roadmaps, and partnering with hardware and software teams to realize innovative and efficient architectures.

Requirements

  • Master's degree (or higher) in Computer Engineering, Computer Science, Electrical Engineering, or a related discipline
  • 10+ years of experience in SoC development, with a focus on RAS architecture
  • Deep understanding of data center-class availability and reliability expectations
  • Expertise in fault detection, error handling, and resiliency techniques for large-scale compute platforms

Nice To Haves

  • Experience designing and deploying RAS strategies for Arm-based architectures
  • Familiarity with reliability modeling, stress and aging analysis, and silicon health monitoring
  • Understanding of firmware and software roles in RAS implementations
  • Exposure to industry standards and specifications such as RAS for PCIe, CXL, or JEDEC memory
  • Hands-on experience with failure analysis and silicon debug workflows
  • Background in safety-critical or high-availability systems (e.g., automotive, aerospace, cloud infrastructure)

Responsibilities

  • Define reliability, availability, and serviceability (RAS) requirements for next-generation SoCs
  • Architect scalable RAS solutions that balance power, performance, and area (PPA) while meeting customer and market needs
  • Develop and guide implementation of reliability-aware design techniques such as ECC, parity, error logging, detection, and mitigation strategies
  • Lead RAS efforts throughout the product lifecycle-collaborating with front-end design, physical implementation, and verification teams
  • Align with cross-functional teams to ensure compliance with data center-class reliability and availability standards

Benefits

  • Competitive compensation structure
  • Paid Time Off (PTO)
  • Sabbaticals
  • Parental bonding leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Industry

Professional, Scientific, and Technical Services

Education Level

Master's degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service