About The Position

It is a mobile first, cloud first world and we’re empowering it. Microsoft Azure is at the heart of the Microsoft Cloud providing the backend infrastructure for hyper-scale distributed and dynamic computing. Our team within Azure provides the software platform which enables internal Microsoft services (including Office 365, Bing.com, XBOX Live, Skype, and OneDrive) as well as many external customers to run their large-scale mission-critical Cloud applications for their businesses. Some of the many areas we are tackling include ring 0 and (-1) core infrastructure services, five 9s (99.999%) reliability, fault-tolerance, distributed service monitoring, operational efficiency within the datacenter hardware lifecycle, performance metrics collection/analysis, alerting, visualization, device operations, and coordination of node diagnostics and repairs.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++ or Rust.
  • OR equivalent experience.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Rust.
  • OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Rust.
  • OR equivalent experience.

Responsibilities

  • Design new features for Microsoft Cloud internal infrastructure software.
  • Keep infrastructure services running and deliver code updates on a regular cadence to improve performance and reliability.
  • Collaborates with appropriate stakeholders to determine user requirements for scenarios.
  • Act as a Designated Responsible Individual (DRI) in monitoring system/product feature/service for degradation, downtime, or interruptions for simple problems, and recommends actions to restore system/product/service by following the playbook.
  • Reviews current developments and proactively seeks new knowledge that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service