Senior Site Reliability Engineer

LanternDallas, TX
Hybrid

About The Position

Lantern is the specialty care platform connecting people with the best care when they need it most. By curating a Network of Excellence comprised of the nation's top specialists for surgery, cancer care, infusions and more, Lantern delivers excellent care with significant cost savings to employers and their workforces. Lantern also pairs members with a dedicated care team, including Care Advocates and nurses, for the entirety of their care journey, helping them get back to good health, back to their families and back to work. With convenient access to specialists nationwide, Lantern means quality care is within driving distance for most. Lantern is trusted by the nation's largest employers to deliver care to more than 6 million members across the country. The company values LOGIC in decision making, INCLUSION as a core tenant, GRIT to tackle big problems, HUMANITY in all decisions for customers, and TRUTH through integrity, thriving in a Team Environment. These pillars of LIGHT are a reminder to the team that they are making a difference by providing guidance and support in navigating the often complex and confusing landscape of healthcare. Lantern is seeking an experienced Senior Site Reliability Engineer to champion the reliability, availability, and performance of their Azure-based healthcare platform. In this pivotal role, you will define and implement SRE practices, drive incident management processes, build observability frameworks, and ensure systems meet stringent uptime and compliance requirements. You will collaborate with platform engineers, application developers, and security teams to embed reliability into every layer of the infrastructure. This role is ideal for an SRE expert with deep experience in production operations, monitoring, incident response, and automation in cloud environments. You will work on the Platform Engineering team, partnering with application developers, infrastructure engineers, and security teams to establish SRE best practices across Lantern. Your focus will be on building resilience, reducing toil through automation, and creating a culture of reliability that ensures the healthcare platform delivers consistent, high-quality service to its users.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience.
  • 4+ years in SRE, DevOps, or production operations roles
  • 3+ years with Microsoft Azure (AWS/GCP a plus)
  • Strong experience with observability tools (Datadog, Azure Monitor, Prometheus, Grafana, or similar)
  • Experience defining and managing SLOs/SLIs and error budgets
  • Proven incident management and on-call experience (Rootly or similar incident management platforms)
  • Hands-on with Infrastructure as Code (Terraform) and CI/CD (Azure DevOps, GitHub Actions)
  • Experience in regulated environments (healthcare/HIPAA preferred)
  • Strong scripting skills (Python, Bash, PowerShell)
  • Excellent communication and collaboration skills

Nice To Haves

  • Deep experience with chaos engineering and reliability testing
  • Experience with Azure Kubernetes Service and containerized workloads
  • Relevant certifications (Azure, SRE, Kubernetes)

Responsibilities

  • Define and track SLOs/SLIs/error budgets for critical healthcare services
  • Build and maintain observability platforms (monitoring, logging, alerting, tracing) using Datadog and Azure Monitor
  • Lead incident management processes using Rootly, including on-call rotations, runbooks, and post-incident reviews
  • Automate operational toil through Infrastructure-as-Code (Terraform) and custom tooling
  • Design and implement disaster recovery and business continuity strategies
  • Collaborate with development teams to improve service reliability through architecture reviews and chaos engineering
  • Optimize system performance, capacity planning, and cost efficiency for Azure infrastructure
  • Ensure production systems meet HIPAA, SOC 2, and other regulatory requirements
  • Maintain and improve CI/CD pipelines to support safe, rapid deployments
  • Mentor junior engineers and foster a culture of reliability and operational excellence

Benefits

  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • Short & Long Term Disability
  • Life Insurance
  • 401k with company match
  • Flexible Time Off
  • Paid Parental Leave
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service