Principal Platform Reliability Engineer

LillyIndianapolis, IN
$126,000 - $224,400Hybrid

About The Position

Eli Lilly and Company seeks a Platform Site Reliability Engineer to join the Software Product Engineering (SPE) Customer Operations team. You will design, operate, and continuously improve highly available, scalable, and fault-tolerant systems across cloud environments. You will play a critical role in establishing reliability standards, driving operational excellence, and enabling engineering teams to build and deploy with confidence.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field
  • 7+ years of hands-on experience with AWS
  • Extensive experience with Kubernetes and containerization technologies (Docker, EKS, etc.)
  • Experience operating production-grade distributed systems
  • Experience in incident management and on-call support models
  • Experience defining and managing SLOs, SLIs, and error budgets
  • Hands-on experience with observability tools such as Splunk and the LGTM stack
  • Experience building and maintaining CI/CD pipelines
  • Proficient Experience in Infrastructure as Code tools (Terraform, CloudFormation, etc.)
  • Experience with scripting in Python, Bash, or PowerShell
  • Experience with networking and cloud architecture fundamentals
  • Experience implementing security best practices in cloud environments
  • Experience troubleshooting complex system and performance issues

Nice To Haves

  • Experience with tools such as ArgoCD, GitHub Actions, or GitOps workflows
  • Familiarity with large-scale enterprise platforms and environments
  • Experience in regulated industries such as healthcare or pharma
  • Exposure to global support models and follow-the-sun operations
  • Strong written communication skills, including crafting incident updates, postmortems, and status summaries for mixed audiences

Responsibilities

  • Define and implement SLOs, SLIs, and reliability standards that establish a consistent foundation for platform health, driving resilience through capacity planning, failover design, and disaster recovery strategies
  • Lead response for P1/P2 incidents, owning rapid mitigation and recovery while conducting thorough root cause analysis and implementing corrective actions that prevent recurrence
  • Develop and maintain runbooks, playbooks, and operational standards that enable the broader engineering organization to respond effectively and consistently
  • Implement and optimize observability frameworks spanning monitoring, logging, tracing, and alerting — improving system visibility and reducing alert noise through actionable, signal-driven insights
  • Leverage platforms such as Splunk, Prometheus, CloudWatch, or equivalent tooling to ensure teams have the telemetry they need to detect, diagnose, and resolve issues proactively
  • Build and maintain CI/CD pipelines and deployment automation; drive adoption of Infrastructure as Code and GitOps practices across engineering teams
  • Support engineering teams in integrating SRE principles throughout the software lifecycle
  • Implement secure-by-design practices across infrastructure and platforms, support vulnerability remediation and secure configurations, and ensure alignment with enterprise security and compliance standards
  • Partner with engineering teams to improve reliability, performance, and deployment practices
  • Provide technical guidance and mentorship to engineers, and communicate system health and incident impact clearly to stakeholders at all levels

Benefits

  • company bonus (depending, in part, on company and individual performance)
  • company-sponsored 401(k)
  • pension
  • vacation benefits
  • medical, dental, vision and prescription drug benefits
  • flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
  • life insurance and death benefits
  • certain time off and leave of absence benefits
  • well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service