Site Reliability Engineer II

VertaforeDenver, CO
$75,000 - $120,000Remote

About The Position

Vertafore is a leading technology company whose innovative software solutions are advancing the insurance industry. Our suite of products provides solutions to our customers that help them better manage their business, boost their productivity and efficiencies, and lower costs while strengthening relationships. Our mission is to move InsurTech forward by putting people at the heart of the industry. We are leading the way with product innovation, technology partnerships, and focusing on customer success. Our fast-paced and collaborative environment inspires us to create, think, and challenge each other in ways that make our solutions and our teams better. We are headquartered in Denver, Colorado, with offices across the U.S., Canada, and India. Role Summary We are seeking a Site Reliability Engineer II to support the reliability, scalability, and performance of critical production services. This role contributes to the full-service lifecycle, helping to transition services from deployment readiness into stable production operations. At Vertafore, we treat operations as a software problem; you will work alongside Senior SREs to apply engineering rigor to our AWS and hybrid environments.

Requirements

  • Experience: 2 to 3.5 years of hands-on experience in SRE, DevOps, or a software engineering role with a focus on system stability.
  • SRE Fundamentals: Understanding of core SRE principles such as SLIs, SLOs, and error budgets.
  • Coding Skills: Proficiency in at least one language such as C#, .NET, Java, Python, or React.
  • Technical Skills: Experience with AWS, CI/CD pipelines (GitLab or Jenkins), and infrastructure as code.
  • Systems Knowledge: Working knowledge of Linux and Windows environments and relational databases.
  • Education: Bachelor’s degree in Computer Science or a related technical field.
  • A fast learner.
  • A problem solver.
  • Ability to document procedures.
  • Able to meet deadlines.
  • Good communication skills. Able to deliver the message effectively to a technical and non-technical audience.
  • Able to comply with processes and procedures.
  • Able to maintain professional composure in any situations.
  • Flexible in working extended hours on occasions or as required.
  • Driven to improve, personally and professionally
  • Operate best in a fast-paced, flexible work environment with ability to work in a team.
  • High speed internet to accommodate working from home needs.
  • Occasional travel to our office location is required.
  • Occasional lifting and/or moving up to 10 pounds.
  • Frequent repetitive hand and arm movements required to operate a computer.
  • Specific vision abilities required by this job include close vision (working on a computer, etc.).
  • Frequent sitting and/or standing.
  • The selected candidate must be legally authorized to work in the United States.

Nice To Haves

  • Exposure in the insurance industry is desired but not mandatory.

Responsibilities

  • Reliability and Observability Support Service Maintenance: Contribute to the operational health and performance of assigned production services.
  • Observability Implementation: Assist in building and maintaining observability frameworks. Help track the Four Golden Signals (latency, traffic, errors, and saturation) to ensure service health is visible.
  • SLO Contribution: Participate in monitoring SLIs and SLOs, providing data to help the team manage error budgets effectively.
  • Engineering and Automation Toil Reduction and Guided Debugging: Work on projects to automate manual and repetitive tasks using scripting, programming, or AI tools. Troubleshoot production issues across infrastructure and application code, implementing durable solutions instead of quick fixes.
  • Deployment Execution: Support production changes such as patching and software releases using established automated pipelines and safety-first practices.
  • Incident Participation and Learning Active Incident Response: Participate in incident response for production events and join on-call rotations.
  • Postmortem Contribution: Assist in root cause analysis and contribute to blameless postmortems to help the team learn from failures.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service