Senior Software Engineer - Chaos Engineering

MicrosoftRedmond, WA
74d$119,800 - $234,700

About The Position

The High Availability (HA) team part of M365 Core, is seeking a Senior Software Engineer - Chaos Engineering. This role is crucial as HA has been a cornerstone of the Substrate backend solution. We continue to explore opportunities for improving and optimizing service reliability. Our continuous strive to provide best service to our customers goes beyond just optimizing the storage stack solution. We work relentlessly on reducing Microsoft capital and operational expenses, as we continue to explore more paths for optimization while maintaining reliable 4.5 9s availability. To achieve that HA has extended its charter beyond traditional database availability and redundancy solution - towards optimizing power efficiency, platform costs, networking costs. The latter will be the major focus of a talented engineer who decides to join our team. Chaos Engineering is the discipline of experimenting on a system to build confidence in the system's capability to withstand turbulent conditions in production. As part of Chaos team in HA, you will be working closely with partners (Azure, EXO-Exchange Online, MSR-Microsoft Research) to build the next generation of Chaos platform for Substrate. The platform will validate the resilience, architecture choices, predictability and even monitoring and incident response processes of critical components in M365 distributed systems.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 3+ years of software design and development experience with backend services.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.

Nice To Haves

  • Cloud and services experience; Azure cloud experience is a plus.
  • Experience writing services and micro-services on middle- or back-end tier.
  • Experience with networking layer optimization and tuning, deploying and maintaining large scale cluster products, defining and testing performance characteristics of backend solutions.
  • Analytical skills with systematic and structured approach to software design.
  • Experience building reliable and well-tested code.

Responsibilities

  • Own feature projects that directly impact behavior of High Availability component of Exchange Online (EXO) that reliably provides 4.5 9s of availability.
  • Write production, monitoring, and test code, create reports and conduct performance analysis of storage engine, database replication, networking layer.
  • Research Chaos experiments, identifying opportunities for testing and operational readiness of critical service components.
  • Engage with EXO, Azure, and MSR partners to build interfaces for a modern Chaos experience, improve service resilience, improve predictability and observability of M365 distributed systems.
  • Embody our Culture and Values.

Benefits

  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Professional, Scientific, and Technical Services

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service