Software Engineering IC5

MicrosoftRedmond, WA
1d

About The Position

Are you excited by the challenge of redefining how people explore and analyze massive datasets? Do you want to build the next-generation platform for real-time insights at cloud scale? Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. Within Azure Data, the Kusto team delivers a high-scale log analytics platform used across Microsoft and by thousands of customers globally, including enterprises, startups, and partners. We power over 400 PB of data ingestion daily across Microsoft Azure and Microsoft Fabric, making Kusto the engine behind mission-critical observability and analytics scenarios. We are looking for a Principal Software Engineer to join our team. In this role, you will help us drive the Kusto revolution and make it THE technology for log search and analytics across the world. You will have the opportunity to work on evolving the RTI solution in Microsoft Fabric with a goal to redefine the real-time data market.

Requirements

  • 10+ years of experience building and operating large‑scale distributed systems in production cloud environments, with deep ownership of reliability, availability, and operational excellence.
  • Strong background in security‑aware system design, including operational security, access control, incident response, and compliance considerations.
  • Proven track record in site reliability engineering, DevOps, or infrastructure operations, including ownership of live‑site health and on‑call rotations.
  • Ability to communicate clearly with both technical and non‑technical stakeholders, especially during high‑pressure operational situations.

Responsibilities

  • Drive security‑first engineering practices by embedding security, compliance, and risk management into system design, deployment pipelines, and operational workflows.
  • Define and track operational KPIs such as SLOs, error budgets, deployment health, and security posture, using data to drive prioritization and accountability.
  • Partner across engineering, security, and platform teams to align on shared reliability and security goals, influencing roadmaps beyond your immediate organization.
  • Guide architecture and design decisions with a focus on fault tolerance, blast‑radius reduction, safe rollout strategies, and operational simplicity in large distributed systems.
  • Lead deep technical investigations during complex incidents, including cross‑stack forensics, root‑cause analysis, and definition of durable corrective actions.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service