Dir I&O Engineering - Remote

UnitedHealth Group•Minnetonka, MN

21h•$134,600 - $230,800•Remote

About The Position

This role designs, engineers, and manages infrastructure and operational platforms—including IaaS, PaaS, and foundational components—establishes reliability and performance standards (SLIs/SLOs/SLAs), drives infrastructure automation and toil reduction, strengthens incident and problem management, and partners with Product and Engineering to embed infrastructure resilience and operability "by design" from architecture through delivery. Success requires disciplined governance, strong executive communication, and the ability to align cross-functional stakeholders to a prioritized roadmap tied to member/employer experience, operational risk reduction, and regulatory/compliance expectations. You’ll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week. Primary Responsibilities: Leadership and Strategy Develop and execute a comprehensive strategy for I&O Engineering aligned with organizational goals, with a focus on improving stability, resilience, infrastructure performance, and supportability across UHCT Employer & Individual (E&I) products, platforms, and critical business journeys Build, lead, and mentor a high-performing team of I&O Engineering professionals, fostering a culture of collaboration, innovation, and continuous improvement Collaborate with cross-functional leaders and engineering teams to integrate infrastructure best practices into all aspects of E&I products and platforms, "baking in" resilience from design to deployment Guide teams on priorities, mentor individual contributors, and report to CIOs on critical paths, mitigation plans, and strategic initiatives Work closely with business and technology stakeholders to develop roadmaps for infrastructure technology portfolios, resolve cross-system and domain dependencies, and ensure effective integration among services offered to the end customer Infrastructure & Operations Engineering Design, engineer, and manage infrastructure and operational platforms supporting E&I products and services, including cloud (IaaS/PaaS), on-premises, and hybrid environments Oversee infrastructure provisioning, configuration management, and capacity planning to ensure scalable, reliable, and cost-effective service delivery Analyze and model infrastructure dependencies (applications, APIs, networking, storage, compute) and proactively identify risks including natural disasters, cyberattacks, and hardware/software failures Define and enforce infrastructure standards, reference architectures, and engineering best practices across the E&I portfolio Lead infrastructure automation initiatives using Infrastructure-as-Code (IaC), CI/CD pipelines, and self-service provisioning to reduce manual toil and accelerate delivery Establish and maintain monitoring, alerting, and observability frameworks to ensure proactive detection and rapid resolution of infrastructure issues Drive cloud migration and modernization strategies, ensuring workloads are optimally placed across cloud-native, containerized, and traditional platforms Monitor technological advancements and industry trends to influence company standards and ensure solutions are continuously improved through product management practices, including recommendations to invest in a solution or retirement of redundant or out-of-date systems Technology Operations Lead and continuously improve incident, problem, and change management processes across E&I infrastructure and platforms Establish and enforce SLIs, SLOs, and SLAs across all infrastructure services, with clear escalation paths and accountability models Drive automation of operational runbooks, alerting, and remediation workflows to reduce MTTR and increase system availability Coordinate with application teams and vendors to ensure infrastructure readiness for peak season, major releases, and business-critical events Maintain operational dashboards providing real-time visibility into infrastructure health, performance, and capacity utilization You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Requirements

10+ years of experience in infrastructure engineering, operations, or platform engineering roles with increasing scope and responsibility
5+ years of leadership experience managing infrastructure or I&O engineering teams
Solid experience with Infrastructure-as-Code (Terraform, Ansible, Chef, Puppet), CI/CD pipelines, and automation frameworks
Proven experience in incident, problem, and change management (ITIL/ITSM frameworks)
Experience managing vendor relationships and driving accountability for infrastructure services
Proven expertise in cloud platforms (Azure, AWS, or GCP), including IaaS, PaaS, networking, and security services
Demonstrated ability to define and govern SLIs, SLOs, and SLAs for infrastructure services
Proven solid executive communication skills with the ability to translate technical infrastructure concepts into business impact

Nice To Haves

Certifications such as AWS Solutions Architect, Azure Solutions Architect, ITIL, or equivalent
Experience in healthcare technology or regulated industries
Experience with AIOps, predictive analytics, or machine learning for infrastructure optimization
Knowledge of containerization (Kubernetes, Docker) and cloud-native architectures
Familiarity with observability platforms (Splunk, Dynatrace, Datadog, Grafana)

Responsibilities

Develop and execute a comprehensive strategy for I&O Engineering aligned with organizational goals, with a focus on improving stability, resilience, infrastructure performance, and supportability across UHCT Employer & Individual (E&I) products, platforms, and critical business journeys
Build, lead, and mentor a high-performing team of I&O Engineering professionals, fostering a culture of collaboration, innovation, and continuous improvement
Collaborate with cross-functional leaders and engineering teams to integrate infrastructure best practices into all aspects of E&I products and platforms, "baking in" resilience from design to deployment
Guide teams on priorities, mentor individual contributors, and report to CIOs on critical paths, mitigation plans, and strategic initiatives
Work closely with business and technology stakeholders to develop roadmaps for infrastructure technology portfolios, resolve cross-system and domain dependencies, and ensure effective integration among services offered to the end customer
Design, engineer, and manage infrastructure and operational platforms supporting E&I products and services, including cloud (IaaS/PaaS), on-premises, and hybrid environments
Oversee infrastructure provisioning, configuration management, and capacity planning to ensure scalable, reliable, and cost-effective service delivery
Analyze and model infrastructure dependencies (applications, APIs, networking, storage, compute) and proactively identify risks including natural disasters, cyberattacks, and hardware/software failures
Define and enforce infrastructure standards, reference architectures, and engineering best practices across the E&I portfolio
Lead infrastructure automation initiatives using Infrastructure-as-Code (IaC), CI/CD pipelines, and self-service provisioning to reduce manual toil and accelerate delivery
Establish and maintain monitoring, alerting, and observability frameworks to ensure proactive detection and rapid resolution of infrastructure issues
Drive cloud migration and modernization strategies, ensuring workloads are optimally placed across cloud-native, containerized, and traditional platforms
Monitor technological advancements and industry trends to influence company standards and ensure solutions are continuously improved through product management practices, including recommendations to invest in a solution or retirement of redundant or out-of-date systems
Lead and continuously improve incident, problem, and change management processes across E&I infrastructure and platforms
Establish and enforce SLIs, SLOs, and SLAs across all infrastructure services, with clear escalation paths and accountability models
Drive automation of operational runbooks, alerting, and remediation workflows to reduce MTTR and increase system availability
Coordinate with application teams and vendors to ensure infrastructure readiness for peak season, major releases, and business-critical events
Maintain operational dashboards providing real-time visibility into infrastructure health, performance, and capacity utilization