Senior AWS DevOps Engineer

ZelisAtlanta, GA
Hybrid

About The Position

At Zelis, we Get Stuff Done. So, let’s get to it! A Little About Us Zelis is modernizing the healthcare financial experience across payers, providers, and healthcare consumers. We serve more than 750 payers, including the top five national health plans, regional health plans, TPAs and millions of healthcare providers and consumers across our platform of solutions. Zelis sees across the system to identify, optimize, and solve problems holistically with technology built by healthcare experts – driving real, measurable results for clients. At Zelis, AI is woven into the fabric of how we work. Every associate is expected - and empowered - to partner with AI to challenge the status quo, accelerate innovation, and amplify their impact. This is a place for builders with a growth mindset who act with agility, embrace change, and use modern technology to shape smarter solutions, exceptional experiences, and the future of our industry for our clients, customers, and our culture. A Little About You You bring a unique blend of personality and professional expertise to your work, inspiring others with your passion and dedication. Your career is a testament to your diverse experiences, community involvement, and the valuable lessons you've learned along the way. You are more than just your resume; you are a reflection of your achievements, the knowledge you've gained, and the personal interests that shape who you are. Position Overview This role leads the design, reliability, and scalability of the enterprise cloud platform on AWS, operating at the intersection of Site Reliability Engineering, platform engineering, and cloud architecture. It defines the SRE operating model and platform vision, establishing reliability north-star architectures and internal developer platform standards that provide paved roads for secure, observable, and scalable services. The position owns mission-critical production systems and drives organization-wide initiatives such as zero-touch operations, governance-by-default, and resilience posture. It sets and implements best practices across infrastructure, CI/CD, observability, and cost optimization, while integrating reliability with security and compliance requirements. Working closely with engineering teams, this role improves system performance and developer productivity, mentors Staff and Principal engineers, and partners with executive leadership on risk management, customer commitments, and regulatory readiness for critical systems.

Requirements

  • Significant experience in DevOps, SRE, or platform engineering roles
  • Deep expertise in AWS architecture and services (EC2, VPC, IAM, S3, RDS, EKS/ECS, Fargate, Lambda, CloudFront)
  • Strong experience with Infrastructure as Code (Terraform preferred)
  • Hands-on experience with Kubernetes (EKS), ECS and container orchestration
  • Proven experience building and managing CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, or similar)
  • Strong background in Linux systems, networking, and distributed systems
  • Proficiency in scripting or programming (Python, Bash, or similar)
  • Experience with observability and monitoring tools (Datadog preferred)
  • Track record of improving system reliability, scalability, and performance

Nice To Haves

  • Experience managing multi-account AWS environments
  • Background in high-scale systems (large datasets, high-throughput applications)
  • Database performance tuning experience (PostgreSQL and MSSqlServer preferred)
  • Experience in regulated environments (security, audit, compliance)

Responsibilities

  • Architecture and operation of large-scale, highly available AWS environments
  • Reliability and performance of distributed systems supporting production workloads
  • Design and implementation of Infrastructure as Code (IaC) and platform automation
  • Evolution of container platforms (ECS, EKS)
  • Organization-wide DevOps and SRE best practices , including SLO-driven engineering
  • Cloud cost optimization and governance across multi-account environments
  • Design and operate scalable, fault-tolerant AWS architectures
  • Lead implementation of Infrastructure as Code using Terraform or CloudFormation
  • Build and evolve CI/CD pipelines to enable rapid, reliable software delivery
  • Own Kubernetes (EKS) platform architecture, scalability, and operational excellence
  • Establish and drive observability strategy (metrics, logging, tracing) using tools like Datadog
  • Define and enforce Service Level Objectives (SLOs) and reliability standards
  • Lead incident response, root cause analysis, and postmortems
  • Optimize infrastructure for cost, performance, and security compliance
  • Mentor engineers and influence platform and architectural decisions across teams

Benefits

  • Measurable improvements in system uptime, latency, and reliability
  • Faster, safer deployments with reduced failure rates
  • Reduced cloud spend through architectural and operational optimizations
  • Strong DevOps culture adoption across engineering teams
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service