Principal Software Engineer - Storage

MicrosoftRedmond, WA
1d

About The Position

Microsoft Silicon and Cloud Hardware Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. CHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Skype, OneDrive and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and HW infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Silicon Cloud Hardware Infrastructure Engineering (SCHIE) team is instrumental in defining and delivering measures of success for hardware design, qualification, fleet support, scale, and sustainability related to Microsoft cloud hardware. Azure Memory and Storage Center of Excellence (AMS CoE) is part of the SCHIE organization focusing on Memory and Storage devices going into the Cloud hardware servers. AMS provide memory and storage solutions to Azure, drive memory and storage suppliers to deliver high quality products, meeting our requirements. We are looking for a Principal Cloud Engineer-Storage to scale Azure’s Fault Self-Healing and Failure Prediction systems. You will own the endtoend technical design and execution of the fault prevention ecosystem, spanning telemetry, ML models, automation, isolation logic, firmware interactions, and repair workflows, operating at hyperscale across millions of nodes. The role directly impacts customer uptime and fleet availability. #SCHIE #Azure

Requirements

  • Do you have Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience?
  • Other: Ability to meet Microsoft, customer and/or government security screening requirements is required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • M.S. Computer or Electrical Engineering
  • 12+ years of SSD firmware engineering development experience
  • 8+ years of NVMe and PCIe experience
  • Deep expertise in SSD virtualization, reliability, fault analysis, and live‑site operations.
  • Lead end‑to‑end design decisions across detection, prediction, mitigation, and repair of SSDs in hyper scale environment.
  • Design component‑agnostic reliability frameworks that work across different components
  • Proven ability to build automation heavy systems that operate safely at hyperscale.

Responsibilities

  • Design and build best-in-class fleet resiliency systems for storage devices at scale
  • Develop scalable live monitoring capabilities, fault detection and repair solutions
  • Design features for SSDs and Storage Accelerator firmware deployment
  • Lead collaboration projects with hardware, firmware and software teams that fault reduction projects
  • Build automation to drive repair efficiency for storage operations in the production fleet
  • Collaborate with suppliers to design reliable, high performance and quality storage devices
  • Analyze data to identify, prototype, and drive the implementation of technical and process improvements to increase the predictability, agility, and quality of Azure systems
  • Actively support Azure service stakeholders
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service