About The Position

Microsoft Azure Storage is a highly distributed, massively scalable, and ubiquitously accessible cloud storage platform. Azure storage already runs at Exascale (storing Exabytes of data) and we will scale our designs over the next decade to support Zettascale (storing Zettabytes of data). As a Software Engineering - Azure Storage, you will design, implement, optimize, and maintain core components of the Azure Storage stack that operate at massive global scale. You will work across the full storage lifecycle—including architecture, implementation, testing, deployment, and operational support—to deliver highly reliable and performant backend systems that power Azure’s storage platform. A key part of this role is driving observability across the storage fleet. You will build and enhance telemetry, monitoring, alerting, diagnostics, and analytics capabilities that ensure deep visibility into system behavior, performance, reliability, and efficiency. Your work will help engineers detect anomalies earlier, understand system health more accurately, and respond to issues with faster root‑cause identification. You will also partner with data scientists and engineering peers to design and integrate Artificial Intelligence/Machine Learning (AI/ML) - driven intelligence into the storage platform. This includes using machine learning techniques and statistical methods to detect patterns, optimize operations, predict failures, and automatically tune systems. You’ll experiment with data, validate models, and ship production-ready intelligent features that directly improve storage efficiency and reliability for Azure’s global customer base. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. If you share our values, love working with data, and want to solve some of the most exciting challenges on a massive product like Azure storage, this could be the perfect position for you.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, Rust OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, Rust OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C++, Rust OR equivalent experience.

Responsibilities

  • Architect, implement, and maintain reliable, scalable, and secure distributed systems and Application Programming Interfaces APIs; write high‑quality, extensible, and diagnosable code with strong attention to performance, resiliency, and maintainability.
  • Drive end‑to‑end observability and operational excellence by instrumenting services with robust telemetry, logging, tracing, alerting, and dashboards; improve Mean Time to Detect (MTTD) / Mean Time to Restore (MTTR) through proactive monitoring, health checks, and data‑driven SRE practices that raise availability and efficiency at global scale.
  • Act as Designated Responsible Individual DRI for critical components and mentor engineers. Hold on‑call ownership to triage and resolve incidents; lead root‑cause analysis and postmortems; coach teammates on production readiness, safe deploys, rollback strategies, and sustainable operations.
  • Collaborate with Product Management (PM), User Experience (UX) and partner teams to clarify user and business needs; decompose work into project/release plans and executable items; incorporate feedback from customers, telemetry, and support channels into iterative improvements.
  • Integrate Artificial Intelligence / Machine Learning (AI/ML) to enhance storage intelligence and efficiency by partnering with data scientists to design and productionize ML capabilities (e.g., anomaly detection, capacity forecasting, auto‑tuning); build scalable pipelines; define online/offline evaluation, A/B experiments, and telemetry‑based model iteration.
  • Apply modern engineering best practices across the lifecycle. Uphold high bars for code reviews, testing (unit/integration/e2e), security, reliability, and performance; reuse and refactor shared components; enforce Continuous Integration / Continuous Delivery (CI/CD) quality gates and incremental rollouts to reduce risk.
  • Elevate technical strategy through data‑driven insights. Use metrics, experiments, and statistical analyses to inform design trade‑offs and platform investments; advocate AI/ML and observability standards; share learnings and patterns that drive consistency in monitoring and operations across teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service