The High Availability (HA) team part of M365 Core, is seeking a Senior Software Engineer - Chaos Engineering. This role is crucial as HA has been a cornerstone of the Substrate backend solution. We continue to explore opportunities for improving and optimizing service reliability. Our continuous strive to provide best service to our customers goes beyond just optimizing the storage stack solution. We work relentlessly on reducing Microsoft capital and operational expenses, as we continue to explore more paths for optimization while maintaining reliable 4.5 9s availability. To achieve that HA has extended its charter beyond traditional database availability and redundancy solution - towards optimizing power efficiency, platform costs, networking costs. The latter will be the major focus of a talented engineer who decides to join our team. Chaos Engineering is the discipline of experimenting on a system to build confidence in the system's capability to withstand turbulent conditions in production. As part of Chaos team in HA, you will be working closely with partners (Azure, EXO-Exchange Online, MSR-Microsoft Research) to build the next generation of Chaos platform for Substrate. The platform will validate the resilience, architecture choices, predictability and even monitoring and incident response processes of critical components in M365 distributed systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Industry
Professional, Scientific, and Technical Services
Education Level
Bachelor's degree