Site Reliability Engineer II

VertaforeDenver, CO
$75,000 - $120,000

About The Position

Vertafore is a leading technology company whose innovative software solutions are advancing the insurance industry. Our suite of products provides solutions to our customers that help them better manage their business, boost their productivity and efficiencies, and lower costs while strengthening relationships. Our mission is to move InsurTech forward by putting people at the heart of the industry. We are leading the way with product innovation, technology partnerships, and focusing on customer success. Our fast-paced and collaborative environment inspires us to create, think, and challenge each other in ways that make our solutions and our teams better. We are headquartered in Denver, Colorado, with offices across the U.S., Canada, and India. Role Summary We are seeking a Site Reliability Engineer II to support the reliability, scalability, and performance of critical production services. This role contributes to the full-service lifecycle, helping to transition services from deployment readiness into stable production operations. At Vertafore, we treat operations as a software problem; you will work alongside Senior SREs to apply engineering rigor to our AWS and hybrid environments.

Requirements

  • Experience: 2 to 3.5 years of hands-on experience in SRE, DevOps, or a software engineering role with a focus on system stability.
  • SRE Fundamentals: Understanding of core SRE principles such as SLIs, SLOs, and error budgets.
  • Coding Skills: Proficiency in at least one language such as C#, .NET, Java, Python, or React.
  • Technical Skills: Experience with AWS, CI/CD pipelines (GitLab or Jenkins), and infrastructure as code.
  • Systems Knowledge: Working knowledge of Linux and Windows environments and relational databases.
  • Education: Bachelor’s degree in Computer Science or a related technical field.

Responsibilities

  • Service Maintenance: Contribute to the operational health and performance of assigned production services.
  • Observability Implementation: Assist in building and maintaining observability frameworks. Help track the Four Golden Signals (latency, traffic, errors, and saturation) to ensure service health is visible.
  • SLO Contribution: Participate in monitoring SLIs and SLOs, providing data to help the team manage error budgets effectively.
  • Toil Reduction and Guided Debugging: Work on projects to automate manual and repetitive tasks using scripting, programming, or AI tools. Troubleshoot production issues across infrastructure and application code, implementing durable solutions instead of quick fixes.
  • Deployment Execution: Support production changes such as patching and software releases using established automated pipelines and safety-first practices.
  • Active Incident Response: Participate in incident response for production events and join on-call rotations.
  • Postmortem Contribution: Assist in root cause analysis and contribute to blameless postmortems to help the team learn from failures.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service