Senior AWS DevOps Engineer

Zelis•Atlanta, GA

49d•Hybrid

About The Position

At Zelis, we Get Stuff Done. So, let’s get to it! A Little About Us Zelis is modernizing the healthcare financial experience across payers, providers, and healthcare consumers. We serve more than 750 payers, including the top five national health plans, regional health plans, TPAs and millions of healthcare providers and consumers across our platform of solutions. Zelis sees across the system to identify, optimize, and solve problems holistically with technology built by healthcare experts – driving real, measurable results for clients. At Zelis, AI is woven into the fabric of how we work. Every associate is expected - and empowered - to partner with AI to challenge the status quo, accelerate innovation, and amplify their impact. This is a place for builders with a growth mindset who act with agility, embrace change, and use modern technology to shape smarter solutions, exceptional experiences, and the future of our industry for our clients, customers, and our culture. A Little About You You bring a unique blend of personality and professional expertise to your work, inspiring others with your passion and dedication. Your career is a testament to your diverse experiences, community involvement, and the valuable lessons you've learned along the way. You are more than just your resume; you are a reflection of your achievements, the knowledge you've gained, and the personal interests that shape who you are. Position Overview This role leads the design, reliability, and scalability of the enterprise cloud platform on AWS, operating at the intersection of Site Reliability Engineering, platform engineering, and cloud architecture. It defines the SRE operating model and platform vision, establishing reliability north-star architectures and internal developer platform standards that provide paved roads for secure, observable, and scalable services. The position owns mission-critical production systems and drives organization-wide initiatives such as zero-touch operations, governance-by-default, and resilience posture. It sets and implements best practices across infrastructure, CI/CD, observability, and cost optimization, while integrating reliability with security and compliance requirements. Working closely with engineering teams, this role improves system performance and developer productivity, mentors Staff and Principal engineers, and partners with executive leadership on risk management, customer commitments, and regulatory readiness for critical systems.

Requirements

Significant experience in DevOps, SRE, or platform engineering roles
Deep expertise in AWS architecture and services (EC2, VPC, IAM, S3, RDS, EKS/ECS, Fargate, Lambda, CloudFront)
Strong experience with Infrastructure as Code (Terraform preferred)
Hands-on experience with Kubernetes (EKS), ECS and container orchestration
Proven experience building and managing CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, or similar)
Strong background in Linux systems, networking, and distributed systems
Proficiency in scripting or programming (Python, Bash, or similar)
Experience with observability and monitoring tools (Datadog preferred)
Track record of improving system reliability, scalability, and performance

Nice To Haves

Experience managing multi-account AWS environments
Background in high-scale systems (large datasets, high-throughput applications)
Database performance tuning experience (PostgreSQL and MSSqlServer preferred)
Experience in regulated environments (security, audit, compliance)

Responsibilities

Architecture and operation of large-scale, highly available AWS environments
Reliability and performance of distributed systems supporting production workloads
Design and implementation of Infrastructure as Code (IaC) and platform automation
Evolution of container platforms (ECS, EKS)
Organization-wide DevOps and SRE best practices , including SLO-driven engineering
Cloud cost optimization and governance across multi-account environments
Design and operate scalable, fault-tolerant AWS architectures
Lead implementation of Infrastructure as Code using Terraform or CloudFormation
Build and evolve CI/CD pipelines to enable rapid, reliable software delivery
Own Kubernetes (EKS) platform architecture, scalability, and operational excellence
Establish and drive observability strategy (metrics, logging, tracing) using tools like Datadog
Define and enforce Service Level Objectives (SLOs) and reliability standards
Lead incident response, root cause analysis, and postmortems
Optimize infrastructure for cost, performance, and security compliance
Mentor engineers and influence platform and architectural decisions across teams