About The Position

Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. Our cloud datacenters run 24x7 and depend on reliable electrical and mechanical telemetry to operate safely, efficiently, and at scale. As a Senior Critical Environment Telemetry Engineer, you will own end-to-end delivery and lifecycle health of telemetry pipelines from Critical Environment (CE) systems, such as power and cooling, into Microsoft’s telemetry platforms. This role is ideal for an experienced industrial controls / automation engineer who can troubleshoot down to packet/register-level details and partner across engineering, operations, integrators, and vendors to deliver resilient, high-quality signals for mission-critical operations. As a CO+I Senior CE Telemetry Engineer, you will perform a key role in delivering the core infrastructure and foundational technologies for Microsoft's online services including Bing, Office 365, Xbox, OneDrive, and the Microsoft Azure platform. As a group, CO+I is focused on the personal and professional development for all employees and offers trainings and growth opportunities including Career Rotation Programs, Diversity & Inclusion trainings and events, and professional certifications. Our infrastructure is comprised of a large global portfolio of more than 200 datacenters in 32 countries and millions of servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Mechanical Engineering, Electrical Engineering, Controls/Automation Engineering, Industrial Engineering or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Data center controls domain depth: Prior experience in data center controls, including BMS and EPMS lifecycle support, troubleshooting, and project delivery (scope/schedule/vendor management).
  • Controls engineering tooling: Experience with PLC/HMI/SCADA development or modification (logic, graphics, alarming, historians), including commissioning practices such as FAT/SAT and structured validation.
  • OT networking & reliability: Understanding of OT networking fundamentals (segmentation, redundancy, routing basics), and the ability to troubleshoot field network issues that impact telemetry stability and latency.
  • Ability to use tools like Python/PowerShell, SQL, or equivalent to automate validation, analyze large telemetry datasets, and speed root-cause investigation.
  • Scripting and data analysis: Ability to use tools like Python/PowerShell, SQL, or equivalent to automate validation, analyze large telemetry datasets, and speed root-cause investigation.
  • Operational excellence: Experience building repeatable standards, reducing incident rates, improving telemetry quality KPIs, and contributing to continuous improvement in mission-critical operations.
  • Safety and compliance familiarity: Exposure to industrial safety/compliance practices and standards relevant to controls environments (e.g., change control rigor, audit readiness, secure configuration practices).
  • Broader industry controls experience: Background in utilities/energy & renewables, oil & gas pipelines, manufacturing automation, chemical/pharma processing, aerospace/automotive test systems, or robotics, especially where uptime, safety, and data integrity are critical.

Responsibilities

  • Deliver telemetry onboarding and operations at scale: Configure, validate, and maintain high-availability telemetry from CE systems (electrical + mechanical) using industrial control and monitoring systems (e.g., SCADA, EPMS, BMS/BAS).
  • Deep troubleshooting and root cause analysis: Diagnose issues across control networks and telemetry stacks (field devices → gateways/connectors → SCADA/servers → cloud ingestion), including protocol-level troubleshooting (e.g., Modbus/BACnet register mapping, comms reliability, polling performance, device addressing).
  • Commissioning, integration, and quality: Support commissioning activities and integrations for new/retrofit sites; ensure telemetry meets defined quality standards (accuracy, completeness, timeliness, stability) before release to production stakeholders.
  • Operate with live-site accountability: Partner with datacenter operations and engineering teams to resolve incidents, drive corrective actions, and prevent recurrence; participate in an on-call rotation as needed to support mission-critical environments.
  • Partner and lead cross-functionally: Collaborate with internal teams (operations, engineering, platform/software, security), system integrators, and equipment vendors to deliver telemetry solutions and resolve systemic issues.
  • Improve standards and automation: Contribute to standard telemetry architectures, repeatable onboarding playbooks, and automated validation/troubleshooting approaches; identify gaps and propose roadmap improvements.
  • Documentation and enablement: Create and maintain clear technical documentation (site onboarding guides, point mapping standards, troubleshooting runbooks) and train engineers/operators on telemetry systems and best practices.
  • Security and reliability mindset: Apply strong operational rigor for OT systems (change control, access management, resilient designs, safe rollout practices) to protect uptime and reduce risk.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service