Director, Site Reliability Engineering

Vertafore•Denver, CO

2d•$175,000 - $220,000

About The Position

The insurance industry runs on Vertafore. We equip agencies, MGAs, and carriers with the core digital systems, specialized AI, and data-driven foundation to eliminate distribution drag across the insurance lifecycle, spanning sales, servicing, and back-office operations. Underpinned by unmatched speed and performance power, we are the trusted backbone that’s taking the insurance industry from friction to flow with Distribution Velocity – speed, performance, and trust - to drive growth at scale. With over 95% of the top agencies and insurers and 50% of industry compliance transactions running through Vertafore, we lead at the intersection of innovation and trust, giving insurance professionals the confidence to transform and win in the AI era. Our reach is global, with headquarters in Denver, Colorado, and offices across the U.S., Canada, and India. The Director, Site Reliability Engineering (SRE) will lead reliability, performance, and observability initiatives for a portfolio of Vertafore products. This role owns SLIs/SLOs, incident response, automation, and CI/CD practices for assigned product families. Directors will manage multiple teams and collaborate with Product Development, Architecture, Cloud Operations, Information Security, and other SRE leaders to ensure operational excellence. This role is responsible for bridging the gap between development and operations by applying a software engineering mindset to system administration. You will own the lifecycle of services - from inception and design, through deployment, operation, and refinement.

Requirements

Applying a software engineering mindset to system administration.
Own the lifecycle of services - from inception and design, through deployment, operation, and refinement.
Define and enforce SLIs/SLOs for a subset of Vertafore flagship products.
Drive observability strategy across application and infrastructure layers.
Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly.
Monitor and cap "Toil" (manual, repetitive operational work) at 50% using Automation and AI tools, ensuring the team spends the remaining time on project work that scales the system.
Manage "Error Budgets" to balance the velocity of feature releases with the stability of the platform, ensuring clear consequences when budgets are exhausted.
Define and participate in 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.
Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5).
Align reliability goals with product roadmaps and customer SLAs.
Manage a group of Managers and Engineers, mentor teams on automation, observability, and reliability best practices.

Responsibilities

Define and enforce SLIs/SLOs for a subset of Vertafore flagship products.
Drive observability strategy across application and infrastructure layers.
Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly.
Monitor and cap "Toil" (manual, repetitive operational work) at 50% using Automation and AI tools, ensuring the team spends the remaining time on project work that scales the system.
Manage "Error Budgets" to balance the velocity of feature releases with the stability of the platform, ensuring clear consequences when budgets are exhausted.
Define and participate in 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.
Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5).
Align reliability goals with product roadmaps and customer SLAs.
Manage a group of Managers and Engineers, mentor teams on automation, observability, and reliability best practices.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume