Senior AWS DevOps Engineer

Zelis•St. Louis, MO

2d•Hybrid

About The Position

This role leads the design, reliability, and scalability of the enterprise cloud platform on AWS, operating at the intersection of Site Reliability Engineering, platform engineering, and cloud architecture. It defines the SRE operating model and platform vision, establishing reliability north-star architectures and internal developer platform standards that provide paved roads for secure, observable, and scalable services. The position owns mission-critical production systems and drives organization-wide initiatives such as zero-touch operations, governance-by-default, and resilience posture. It sets and implements best practices across infrastructure, CI/CD, observability, and cost optimization, while integrating reliability with security and compliance requirements. Working closely with engineering teams, this role improves system performance and developer productivity, mentors Staff and Principal engineers, and partners with executive leadership on risk management, customer commitments, and regulatory readiness for critical systems.

Requirements

Significant experience in DevOps, SRE, or platform engineering roles
Deep expertise in AWS architecture and services (EC2, VPC, IAM, S3, RDS, EKS/ECS, Fargate, Lambda, CloudFront)
Strong experience with Infrastructure as Code (Terraform preferred)
Hands-on experience with Kubernetes (EKS), ECS and container orchestration
Proven experience building and managing CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, or similar)
Strong background in Linux systems, networking, and distributed systems
Proficiency in scripting or programming (Python, Bash, or similar)
Experience with observability and monitoring tools (Datadog preferred)
Track record of improving system reliability, scalability, and performance

Nice To Haves

Experience managing multi-account AWS environments
Background in high-scale systems (large datasets, high-throughput applications)
Database performance tuning experience (PostgreSQL and MSSqlServer preferred)
Experience in regulated environments (security, audit, compliance)

Responsibilities

Design and operate scalable, fault-tolerant AWS architectures
Lead implementation of Infrastructure as Code using Terraform or CloudFormation
Build and evolve CI/CD pipelines to enable rapid, reliable software delivery
Own Kubernetes (EKS) platform architecture, scalability, and operational excellence
Establish and drive observability strategy (metrics, logging, tracing) using tools like Datadog
Define and enforce Service Level Objectives (SLOs) and reliability standards
Lead incident response, root cause analysis, and postmortems
Optimize infrastructure for cost, performance, and security compliance
Mentor engineers and influence platform and architectural decisions across teams