Software Engineer (SRE) - Remote

UnitedHealth Group•Eden Prairie, MN

5h•$72,800 - $130,000•Remote

About The Position

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. We are seeking a Software Engineer with strong Site Reliability Engineering (SRE) capabilities to support a critical modernization initiative within a call center applications team. This role will focus on improving application stability, supporting cloud-based deployments, and enabling the transformation of legacy call center platforms into digital self-service chat and voice bot solutions. The engineer will work collaboratively across multiple teams, supporting cloud environments, deployments, and incident response activities to ensure highly available and resilient systems. If you reside in the states of Minnesota and District of Columbia, you will enjoy the flexibility of a hybrid-remote role as you take on some tough challenges.

Requirements

Bachelor's degree in computer science, Engineering, or related technical field (or equivalent practical experience)
5+ years of hands on experience deploying and supporting applications in cloud platforms (GCP, AWS, Azure)
4+ years of experience working in a Site Reliability Engineering (SRE), DevOps, or production support-focused software engineering role
3+ years of experience with cloud-based deployments and CI/CD pipelines, including troubleshooting deployment and region-related issues
3+ years of experience with application performance monitoring and logging tools (Splunk, Dynatrace, Grafana)
3+ years of experience with writing scripts or tools in Python,React similar languages
Experience in Terraforms and Provisioning Environments
Experiencing querying logs, trace transactions, and identifying root causes of application issues

Nice To Haves

Experience supporting large-scale or legacy system modernization efforts
Familiarity with call center platforms or digital self-service / bot-enabled applications
Proven exposure to global delivery models with teams based in multiple regions
Proven solid cross-team collaboration skills, with the ability to work across distributed teams and time zones

Responsibilities

Partner with cross-functional engineering and operations teams to ensure application reliability, stability, and performance
Support cloud environment provisioning, readiness, and ongoing operations
Assist with and monitor pipeline setup and cloud deployments, including daily and nightly deployments
Participate in production support, war rooms, and incident response efforts, helping to diagnose and resolve issues quickly
Debug issues across regions by tracing logs and analyzing system behavior in cloud environments
Leverage application performance monitoring tools to identify, troubleshoot, and prevent system issues
Support the reliability, availability, and performance of distributed systems across cloud, edge, and device environments
Help define, measure, and monitor SLIs and SLOs for services.
Identify reliability risks and collaborate with senior engineers on mitigation plans
Participate in on call rotations and assist with incident response and post incident reviews
Contribute improvements to runbooks, automation, and tooling that reduce alert noise and operational toil
Help enhance detection, alerting, and response workflows
Implement and improve telemetry using OpenTelemetry, Grafana,splunk and related tools
Build dashboards and tools that improve visibility into system health and AI service behavior
Ensure observability data is complete, accurate, and actionable
Support safe, reliable deployment workflows including canaries, staged rollouts, and automated rollbacks
Assist in improving CI/CD systems and deployment tooling
Work closely with senior SREs, DevOps engineers, AI/ML teams, and platform engineers
Contribute to reliability reviews, operational readiness checks, and cross team projects
Advocate for modern SRE and DevOps practices within the organization