Site Reliability Engineer - Disaster Recovery & Business Continuity

Charles River AssociatesBoston, MA
Hybrid

About The Position

The IT Business Continuity Coordinator supports the firm’s operational resilience with a primary focus on IT service continuity and disaster recovery (DR) readiness (approximately 80%), and a secondary focus on coordinating business continuity planning with non-IT stakeholders (approximately 20%). This role partners closely with infrastructure, security, application, and service delivery teams to maintain actionable recovery documentation, validate recovery capabilities through testing, and drive ongoing improvements through remediation tracking and lessons learned.

Requirements

  • Experience supporting IT service continuity and/or disaster recovery (DR) for enterprise services and applications, including runbook maintenance and coordinating technical exercises
  • Working knowledge of resilience concepts and metrics (RTO, RPO), incident/change management, and IT service management practices
  • Ability to coordinate cross-functional IT stakeholders (infrastructure, cloud, network, security, applications) and manage timelines, communications, and documentation across multiple workstreams
  • Strong documentation and analytical skills; able to produce clear runbooks, test plans, after-action reports, and remediation tracking
  • Familiarity with core IT platforms and enterprise services (identity, networking, virtualization, backups, Windows/Microsoft 365, cloud/SaaS) to understand recovery dependencies
  • Comfort working with risk, compliance, and audit stakeholders; experience collecting evidence and supporting control attestations is a plus
  • Exposure to business continuity activities (BIA inputs, plan owner coordination, tabletop facilitation) is helpful

Responsibilities

  • Maintain and continuously improve IT service continuity and DR documentation, including service recovery plans, application recovery procedures, and dependencies.
  • Partner with infrastructure and application teams to document, review, and standardize DR runbooks (recovery steps, prerequisites, validation checks, and recovery sequencing).
  • Coordinate evidence and periodic validation activities related to backups, restores, and data recovery procedures in collaboration with platform owners.
  • Plan and coordinate DR tests and resilience exercises, including scope, schedules, participant communications, success criteria, evidence collection, and after-action reporting.
  • Maintain remediation logs, drive follow-through on action items, and report status and risks to stakeholders; incorporate lessons learned into updated runbooks and plans.
  • Track plan review cadence, test completion, and key metrics; support audit-ready evidence collection and risk/compliance requests related to IT resilience.
  • Help assess IT resilience considerations for key vendors and dependencies (e.g., SaaS, telecom, data centers) and document contingency approaches with service owners.
  • During disruptions, support coordination of technical recovery status updates and stakeholder communications in partnership with IT incident management and leadership.
  • Support business impact analysis inputs (critical processes, contacts, workarounds) and coordinate periodic awareness/training for non-IT plan owners as needed.

Benefits

  • medical
  • dental
  • vision insurance
  • 401(k) retirement plan with employer match
  • life and disability insurance
  • paid time off (vacation, sick leave, holidays)
  • paid parental leave
  • wellness programs
  • employee assistance resources
  • commuter benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service