Sr. Site Reliability Engineer

Vertafore•Denver, CO

1d•$110,000 - $125,000

About The Position

Vertafore is a leading technology company whose innovative software solutions are advancing the insurance industry. Our suite of products provides solutions to our customers that help them better manage their business, boost their productivity and efficiencies, and lower costs while strengthening relationships. Our mission is to move InsurTech forward by putting people at the heart of the industry. We are leading the way with product innovation, technology partnerships, and focusing on customer success. Our fast-paced and collaborative environment inspires us to create, think, and challenge each other in ways that make our solutions and our teams better. We are headquartered in Denver, Colorado, with offices across the U.S., Canada, and India. We are seeking a Senior Site Reliability Engineer to own the reliability, scalability, performance, and operational integrity of critical production services. This role is accountable for the full-service lifecycle, from design and deployment readiness through production operations, incident response, and continuous improvement. Reliability is a core engineering responsibility, requiring strong software engineering skills and autonomous operation across AWS, hybrid data centers, and customer-hosted environments.

Requirements

8+ years of hands-on Site Reliability Engineering or reliability-focused engineering experience with end-to-end service ownership.
Proven operation at a senior engineering scope with accountability for reliability outcomes.
Strong software engineering skills in C#, .NET, Java, Python, React, or similar technologies.
Practical experience applying SRE principles (SLIs, SLOs, error budgets).
Hands-on experience with AWS, Kubernetes, CI/CD, infrastructure as code and hybrid environments.
Strong knowledge of Linux and Windows systems, application platforms and relational databases.
Bachelor’s or master’s degree in computer science or equivalent experience.
Participation in an on-call rotation; flexible hours as required.
A fast learner.
A problem solver.
Ability to document procedures.
Able to meet deadlines.
Good communication skills. Able to deliver the message effectively to a technical and non-technical audience.
Able to comply with processes and procedures.
Able to maintain professional composure in any situations.
Flexible in working extended hours on occasions or as required.
High speed internet to accommodate working from home needs.
Occasional travel to our office location is required.

Nice To Haves

Exposure in the insurance industry is desired but not mandatory.

Responsibilities

Own production services end to end.
Accountable for reliability, availability, scalability, performance, and operational health.
Define and manage SLIs and SLOs, using error budgets to guide delivery decisions.
Influence of service and system design to improve fault tolerance, observability and operational sustainability.
Debug complex production issues across application code, services and infrastructure using software engineering practices.
Perform root cause analysis using logs, metrics, traces, and code-level investigation.
Build automation and self-healing mechanisms to prevent repeat failures.
Execute production changes (patching, certificate management, software releases) with safety, automation, and observability.
Design and operate production observability aligned to service health and customer impact.
Lead and participate in incident response, for high-severity events.
Collaborate with engineering, product, architecture, and operations teams.
Operate with autonomy and sound judgment in reliability decisions.