Senior Software Engineer - Storage

MicrosoftRedmond, WA
1d

About The Position

Microsoft Silicon and Cloud Hardware Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. CHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Skype, OneDrive and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and HW infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Silicon Cloud Hardware Infrastructure Engineering (SCHIE) team is instrumental in defining and delivering measures of success for hardware design, qualification, fleet support, scale, and sustainability related to Microsoft cloud hardware. Azure Memory and Storage Center of Excellence (AMS CoE) is part of the SCHIE organization focusing on Memory and Storage devices going into the Cloud hardware servers. AMS provide memory and storage solutions to Azure, drive memory and storage suppliers to deliver high quality products, meeting our requirements. We are looking for a Senior Software Engineer-Storage to scale Azure’s Fault Self-Healing and Failure Prediction systems including the storage subsystem. You will develop the end-to-end technical design and execution of the fault prevention ecosystem, spanning telemetry, ML models, automation, isolation logic, firmware deployment, and repair workflows, operating at hyperscale across millions of nodes. The role directly impacts customer uptime and fleet availability. #SCHIE #Azure

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements is required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • M.S. Computer or Electrical Engineering
  • 8+ years of SSD firmware engineering development experience
  • 4+ years of NVMe and PCIe experience
  • Expertise in SSD virtualization, reliability, fault analysis, and live‑site operations
  • Ability to analyze storage system solutions and drive towards recommendation based on data and objective reasoning
  • Ability to lead collaborative technical projects from conception to successful implementation
  • Demonstrable organizational, problem solving and prioritization skills
  • Ability to deal with ambiguity, resolve conflicts, prioritize multiple strategic and tactical options and drive issues to closure without compromising on quality

Responsibilities

  • Design and build best-in-class fleet resiliency systems for storage devices at scale
  • Develop scalable live monitoring capabilities, fault detection and repair solutions
  • Deploy SSD and Storage Accelerator firmware to hyperscale cloud
  • Lead collaboration projects with hardware, firmware, and software teams that fault reduction projects
  • Build automation to drive repair efficiency for storage operations in the production fleet
  • Collaborate with suppliers to design reliable, high performance and quality storage devices
  • Analyze data to identify, prototype, and drive the implementation of technical and process improvements to increase the predictability, agility, and quality of Azure systems
  • Actively support Azure service stakeholders
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service