About The Position

This role sits at the intersection of infrastructure resiliency, recovery planning, and enterprise execution. It is designed for a strong technical lead who can coordinate large-scale deliverables across engineering, operations, risk, and control teams while keeping complex work moving toward clear outcomes. The position focuses on Enterprise Critical Infrastructure planning, restoral testing, documentation, and governance, with relevance to public cloud, hybrid infrastructure, storage, backup, disaster recovery, observability, and broader infrastructure operational resiliency. Success in this role requires both delivery discipline and enough technical depth to engage credibly with subject matter experts, challenge assumptions, and turn recovery requirements into practical, audit-ready execution. The Planning and Testing Lead will drive critical resiliency work across the firm’s infrastructure landscape, partnering with Enterprise Critical Infrastructure owners, application teams, infrastructure SMEs, and control partners to develop plans, coordinate testing, and improve recovery readiness. This is a technical leadership role for someone who understands how modern infrastructure is built and restored across public cloud, hybrid environments, storage, backup, disaster recovery, observability, and SRE-aligned operating models. The person in this role will lead development and maintenance of ECI Restoral Plans, Restoral Test Plans, and execution documentation, ensuring the work is complete, traceable, and aligned with enterprise standards and control expectations. The ideal candidate is likely to come from infrastructure operations, disaster recovery, platform engineering, or resiliency-focused technical leadership roles and should be comfortable organizing SMEs, navigating cross-team dependencies, and translating technical recovery complexity into clear plans, evidence, and sustainable operating routines.

Requirements

  • 7+ years of experience in technology, infrastructure operations, infrastructure resiliency, disaster recovery, restoration planning, recovery testing, technology risk, or a closely related technical execution role.
  • Experience leading complex, cross-functional technical deliverables across multiple stakeholder groups, including infrastructure SMEs, application teams, control partners, and technology leaders.
  • Working knowledge of public cloud and hybrid infrastructure environments, with emphasis on AWS and/or Azure, hybrid compute, storage, backup, and disaster recovery/restoral capabilities.
  • Experience with technical infrastructure documentation, restoral planning, recovery testing, operational resiliency processes, or infrastructure risk assessments.
  • Ability to provide technical leadership over SMEs, challenge assumptions, organize technical inputs, and translate infrastructure recovery requirements into clear planning, testing, and governance deliverables.
  • Strong understanding of infrastructure dependencies, recovery sequencing, test evidence, observations, after-action reporting, and remediation tracking.
  • Experience facilitating structured working sessions, review forums, and approval processes with technical SMEs, control partners, and senior leaders.
  • Demonstrated ability to manage risks, issues, milestones, and deliverables in a controlled technology environment.
  • Strong analytical, communication, and partnership skills with the ability to work across infrastructure, application, risk, and compliance stakeholders.
  • Ability to work under pressure and manage competing requirements while maintaining quality, control discipline, and delivery focus.
  • Collaboration
  • Project Management
  • Result Orientation
  • Solution Delivery Process
  • Stakeholder Management
  • Analytical Thinking
  • Business Acumen
  • Influence
  • Risk Management
  • Solution Design
  • Technical Strategy Development
  • Infrastructure Operations
  • Disaster Recovery
  • Operational Resiliency
  • Cloud Infrastructure
  • Storage and Backup
  • Observability
  • SRE Practices

Nice To Haves

  • Experience with Enterprise Critical Infrastructure planning, restoral testing, or infrastructure operational resiliency governance.
  • Experience with public cloud infrastructure, including AWS and/or Azure, and how cloud-hosted services are recovered, tested, monitored, and governed in a hybrid enterprise environment.
  • Experience with storage, backup, disaster recovery, data protection, infrastructure recovery testing, or recovery evidence validation.
  • Experience with observability, monitoring, SRE practices, service health indicators, or operational readiness measures used to assess infrastructure recovery or restoral health.
  • Experience with risk assessments, control requirements, audit-facing deliverables, and evidence-based governance processes.
  • Experience with dependency mapping, sequencing/timing analysis, technical workarounds, and infrastructure recovery order validation.
  • Experience supporting documentation that requires formal approvals, version control, traceability, and audit-ready evidence.
  • Strong written communication skills with the ability to produce clear, structured, technical documentation for senior technology, risk, and control audiences.
  • Experience with automation, scripting, or infrastructure-as-code approaches such as Terraform, Ansible, PowerShell, or Python to standardize, validate, or scale operational processes.

Responsibilities

  • Lead kickoff meetings and recurring working sessions with ECI Owners, infrastructure SMEs, application teams, control partners, and technology leaders to plan, track, and execute ECI planning and testing deliverables.
  • Coordinate development and ongoing maintenance of ECI Restoral Plans, Restoral Test Plans, Restoral Test Execution documentation, test evidence, observations, after-action reporting, and remediation tracking.
  • Provide technical leadership across infrastructure SMEs supporting public cloud, hybrid hosting, storage, backup, and disaster recovery/restoral domains.
  • Drive execution of documentation and testing milestones for assigned ECIs, escalating risks, dependencies, and delivery concerns as needed to meet required timelines.
  • Facilitate inline quality assurance and forum-based quality control reviews; incorporate feedback through checklists, review routines, and feedback trackers.
  • Coordinate review and approval workflows with ECI Owners, senior leaders, infrastructure SMEs, and control partners for restoral plans, test plans, test results, improvement options, and supporting evidence.
  • Support mapping of Prioritized Critical Service dependencies to ECIs and maintain dependency mapping outputs through recurring review routines.
  • Analyze infrastructure recovery sequencing and timing information, identify dependency conflicts or circular dependencies, and partner with stakeholders to document workarounds, recovery order updates, and improvement recommendations.
  • Coordinate ECI restoral testing activities, including representative sample testing, tabletop testing, evidence collection, observations, after-action reporting, and remediation tracking.
  • Support Maximum Tolerable Downtime activities by applying defined methodology inputs, documenting results, and helping identify ECIs requiring risk assessment or improvement options.
  • Partner with ECI Owners and leadership to develop, quality review, socialize, and decision restoral improvement options and recommendations for ECIs that exceed assigned recovery expectations.
  • Ensure final documentation, approvals, testing evidence, and remediation artifacts are maintained in appropriate repositories and aligned to document management, governance, and audit-ready traceability expectations.
  • Track success metrics and provide status updates to stakeholders and leadership pertaining to target outcomes, delivery, performance, risks, issues, and schedule.
  • Collaborate with sponsors and stakeholders to ensure execution is aligned with deliverable requirements, enterprise change expectations, and resiliency governance objectives.
  • Leads and coordinates routines to support delivery of large programs, such as kick-offs, status reviews, stakeholder meetings, change controls, and tollgates
  • Broadens relationships with business and technology leaders across multiple organizations, as well as Compliance and Risk
  • Establishes target outcomes in partnership with stakeholders and leaders
  • Tracks success metrics and provides status updates to stakeholders and leadership pertaining to the target outcomes, delivery, performance, risks, issues, and schedule
  • Collaborates with sponsors and stakeholders to ensure that execution is aligned with deliverable requirements
  • Manages program financials and supports resource planning
  • Ensures adherence with Enterprise Change Management standards

Benefits

  • affordable, competitive and flexible benefits
  • opportunities to learn, grow, and make an impact
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service