Principal Customer Experience Program Manager

MicrosoftMineral, VA
Hybrid

About The Position

Are you a customer-obsessed, engineering-minded program leader who thrives in high-stakes, regulated environments? Do you want to build a new function from the ground up, one that prevents customer outages before they happen and transforms how Microsoft supports its most sensitive cloud customers? Join Advanced Cloud Engineering & Supportability (ACES), a global Azure engineering support organization within Azure Engineering Operations (EngOps). ACES delivers engineering-led, world-class support across Azure's Government and Sovereign cloud portfolio, including US Government (Fairfax), and National Partner Clouds in France (Bleu), Germany (Delos), and Singapore (Merlion). We are building a new Gov Customer Resiliency function within ACES that brings proactive reliability engineering in-house for Government customers. This is not reactive support, this is about changing the probability, blast radius, and recovery time of customer outages through engineering-led detection, readiness, and prevention. The role involves leading two interconnected workstreams under ACES Sovereign & Government: Gov Customer Resiliency (60%) and Sovereign Cloud Operations & Readiness (40%). For Gov Customer Resiliency, you will build and operate a new function from scratch, starting with a named high-profile Government customer and scaling to a portfolio of 3-5 top Gov/Azure Engineering Direct customers. This function brings proactive resiliency capabilities in-house for Government customers under Sovereign & Government business. You will own the full resiliency lifecycle: proactive detection and monitoring, incident and crisis management coordination, post-incident RCA and problem management, architecture and DR guidance, and parity closure between Government and Commercial cloud environments. This is a build + run role, with initial shadowing and codification of the operating model, then ownership and scaling. For Sovereign Cloud Operations & Readiness, you will drive support readiness, operational maturity, and customer experience strategy across Microsoft's Sovereign Cloud portfolio (Bleu, Delos, Merlion). This includes readiness frameworks for new Sovereign cloud launches, escalation flow design, CRI playbooks, Sev handling standards, cross-cloud staffing models, and compliance-aligned operational processes and playbooks. You will partner closely with Sovereign delivery leadership, Azure engineering, and regional National Cloud Operating Entity (NCOE) partners to ensure Sovereign clouds are support-ready, compliant, and capable of delivering exceptional customer outcomes from Day 1. This role is strategic as it sits at the intersection of two of ACES' most significant investments: Gov Customer Resiliency brings proactive reliability engineering in-house for Government customers, moving the organization from reactive support to engineered prevention, and Sovereign Cloud Readiness ensures Microsoft's most compliance-sensitive cloud environments are support-ready from Day 1, protecting customer trust. The person in this role will build a new function, run it customer-facing, and scale it across the most critical cloud environments Microsoft operates, defining how Microsoft supports its highest-trust customers.

Requirements

  • Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field
  • 6+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • US Citizenship & Citizenship Verification: This position requires verification of citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, and as a condition of employment, the successful candidate’s US citizenship will be verified with a valid passport.
  • Microsoft Cloud Background Check: Required upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 8+ years' experience in engineering, product/technical program management, data analysis, or product development OR Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 12+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • Experience in CRE, SRE, ACE, or operational reliability roles within a cloud hyperscaler environment.
  • Hands-on experience with resiliency tooling, platform monitoring and similar detection and incident management systems.
  • Deep knowledge of Sovereign compliance, data residency, and geo-centric architecture models (e.g., EU Data Boundary, government cloud isolation requirements).
  • Track record of executive-level customer engagement, ability to lead confidence calls, MBRs, and exec-level progress reviews with enterprise customers.
  • Demonstrated experience in customer-facing resiliency, reliability engineering, or incident management roles, including proactive detection, crisis coordination, or post-incident program management. Customer and Field facing experience driving deep technical and architecture conversations including resiliency workshops.
  • Experience working with government agencies, sovereign entities, or regulated industries, with strong understanding of their missions, operating models, compliance requirements, and IT environments.
  • Strong understanding of Azure services and cloud technologies, including monitoring, diagnostics, incident response tooling, and infrastructure architecture.
  • Proven ability to build new functions or programs from scratch, defining charter, playbooks, metrics, operating cadences, and scaling across customers.
  • Exceptional cross-org stakeholder management skills, ability to drive alignment across engineering, support delivery, product teams, and customer-facing partners without direct authority.
  • Experience working effectively across multiple geographies, cultures, and organizational boundaries.

Responsibilities

  • Stand up a new proactive resiliency function for Government cloud customers, define charter, build playbooks, establish operating cadences, and own the end-to-end engagement model
  • Own the full resiliency lifecycle: proactive detection and monitoring, incident and crisis coordination, post-incident root cause analysis, and architecture/DR guidance
  • Drive Gov-vs-Commercial parity closure across monitoring, tooling, incident response, and remediation maturity
  • Drive resiliency and reliability workshops and customer conversations including Field enablement teams to drive customer value.
  • Scale the resiliency model from a single anchor customer to a portfolio of 3-5 top Government customers using a repeatable, metrics-driven playbook
  • Develop and deliver internal enablement content such as training materials, case studies, and learning sessions, to embed resiliency practices across Gov Support delivery teams and scale knowledge beyond the immediate function
  • Define and report on success metrics including mean time to detect, time to engage, incident recurrence, proactive detection rates, and customer confidence
  • Leverage telemetry, monitoring data, and trend analysis to proactively identify and address emerging risks before they become customer-reported incidents
  • Partner with reliability engineering, product teams, and delivery leadership to ensure resiliency insights feed into upstream engineering actions, product improvements, & prevention strategies
  • Drive end-to-end support readiness (people, process, technology) for Microsoft's Sovereign Cloud portfolio across multiple regions and future launches
  • Design escalation pathways, incident handling standards, and compliance-aligned operational processes for Sovereign environments
  • Own readiness frameworks for new Sovereign cloud launches, influence design decisions upstream to prevent customer impact
  • Lead operational reporting and insights; translate data into risk assessments and executive-ready recommendations
  • Represent Sovereign and Government customer needs in cross-org forums, influencing priorities and investments to strengthen long-term customer trust
  • Leverage AI, automation, and data-driven insights to proactively identify gaps, reduce risk, and improve customer experience at scale
  • Extend the Gov Resiliency playbook to Sovereign clouds as they mature, build a unified approach across regulated environments
  • Drive alignment across geographically distributed teams and operating partners spanning multiple countries and time zones
  • Embody our culture and values.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service