Sr. Database Site Reliability Engineer (DB SRE)

McKesson•Columbus, OH

64d•Remote

About The Position

McKesson is an impact-driven, Fortune 10 company focused on making quality healthcare more accessible and affordable. The Sr. Database Site Reliability Engineer (DB SRE) is responsible for owning the reliability, availability, and operational maturity of business-critical Azure PostgreSQL platforms supporting the CoverMyMeds (CMM) platform. This role applies Site Reliability Engineering (SRE) principles to database services, emphasizing automation, Infrastructure as Code, observability, and resilience in a modern cloud-native environment. The position is part of Platform Core Engineering, collaborating with application, platform, security, and network engineering teams. The DB SRE will manage database reliability end-to-end across development, staging, and production environments. While remote work is allowed, candidates residing in the Columbus, OH metropolitan area will be prioritized. Sponsorship is not available for this role.

Requirements

Bachelor’s degree preferred; relevant experience considered in lieu of degree
Typically 7+ years of relevant experience in SRE, platform, or infrastructure engineering roles
7+ years hands‑on experience operating PostgreSQL databases in cloud environments (Azure strongly preferred)
Strong production experience (7+ years) supporting high‑availability, business‑critical database platforms
Deep expertise with Infrastructure as Code, particularly Terraform
Experience owning or participating in on‑call rotations and incident response
Strong understanding of database operations, including performance tuning, replication, backup/restore, and recovery
Experience designing and operating database observability and monitoring solutions
Solid knowledge of cloud security principles, including least‑privilege access and audit readiness
Proven ability to communicate effectively with technical and non‑technical stakeholders

Nice To Haves

Background as an SRE with strong database depth (not a traditional DBA role)
Experience with CI/CD pipelines, Git/GitOps workflows
Familiarity with Kubernetes (AKS preferred), Helm, and ArgoCD
Experience operating stateful workloads in Azure cloud environments
Exposure to regulated or highly controlled environments
Broader cloud platform experience beyond databases

Responsibilities

Own and continuously improve the reliability, availability, and performance of Azure PostgreSQL platforms across dev, stage, and prod
Design, build, and operate cloud database infrastructure using Infrastructure as Code (Terraform)
Apply SRE principles to stateful systems, including environment isolation, blast‑radius reduction, and automation-first operations
Define and implement database observability (metrics, logs, dashboards, alerts) using enterprise monitoring tools (e.g., Datadog)
Lead incident response for database-related production issues and participate in on‑call rotations
Troubleshoot complex issues across performance, replication, connectivity, failover, and permissions
Define and validate high availability, backup, restore, disaster recovery, and point‑in‑time recovery (PITR) strategies
Enforce least‑privilege access, support audits, and ensure compliance with security and governance requirements
Collaborate with platform, application, security, and network teams to design scalable, secure database architectures
Provide senior technical leadership, set reliability standards, and mentor less‑experienced engineers