About The Position

Vertafore is a leading technology company whose innovative software solutions are advancing the insurance industry. Our suite of products provides solutions to our customers that help them better manage their business, boost their productivity and efficiencies, and lower costs while strengthening relationships. Our mission is to move InsurTech forward by putting people at the heart of the industry. We are leading the way with product innovation, technology partnerships, and focusing on customer success. Our fast-paced and collaborative environment inspires us to create, think, and challenge each other in ways that make our solutions and our teams better. We are headquartered in Denver, Colorado, with offices across the U.S., Canada, and India. The SRE Architect is responsible for the technical vision and long-term architectural strategy for reliability across Vertafore’s global product portfolio. You will design the cross-cutting systems, automation frameworks, and observability standards that allow our engineering teams to scale without a linear increase in operational overhead. This role bridges the gap between high-level business strategy and deep technical execution, ensuring that "Reliability by Design" is baked into every layer of our AWS and hybrid infrastructure.

Requirements

  • Deep technical expertise in AWS and hybrid infrastructure.
  • Ability to design for high availability, fault tolerance, and global scalability.
  • Experience defining and implementing standardized infrastructure and deployment patterns.
  • Expertise in architecting global observability strategies, including telemetry for Latency, Traffic, Errors, and Saturation.
  • Experience designing and implementing SLIs, SLOs, and Error Budgets.
  • Ability to act as a technical arbiter for error budget policies.
  • Skills in identifying systemic sources of toil and architecting software solutions for their elimination.
  • Leadership in Infrastructure-as-Code (Terraform, CDK) and configuration management, with a focus on immutable infrastructure.
  • Experience architecting and implementing advanced self-healing and auto-remediation frameworks.
  • Leadership in blameless postmortems and analysis of complex system failures.
  • Ability to mentor Tech Leads and Senior SREs.
  • Strong collaboration skills with product development, architecture, and product owners.

Responsibilities

  • Lead the architectural review of new and existing services to ensure they are built for high availability, fault tolerance, and global scalability.
  • Define the "Golden Paths" for infrastructure and deployment, ensuring that teams use standardized, pre-approved patterns for Vertafore tech stack.
  • Architect the global observability strategy, ensuring every product family has automated, consistent telemetry for Latency, Traffic, Errors, and Saturation.
  • Design and oversee the organization-wide implementation of SLIs, SLOs, and Error Budgets.
  • Act as the ultimate technical arbiter for error budget policies, ensuring they are used as a mathematical contract to balance feature velocity and system stability.
  • Identify systemic sources of toil across the enterprise and architect software solutions to eliminate them globally, maintain a 50% ratio of ops to project work.
  • Lead the strategy for Infrastructure-as-Code (Terraform, CDK), AI tooling and technologies and configuration management, moving the organization toward a fully immutable infrastructure model.
  • Architect and implement advanced self-healing and auto-remediation frameworks to reduce the need for manual incident intervention.
  • Set the standard for blameless postmortems and lead to the analysis of the most complex, cross-functional system failures.
  • Occasionally participate in high-priority incidents to guide teams towards successful resolution in a timely manner.
  • Mentor Tech Leads and Senior SREs, fostering a culture where operations are treated as a software engineering discipline.
  • Collaborate with various departments like Product Development, Architecture and Product Owners to align reliability goals with the business roadmap and innovate product software and infrastructure design.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service