About The Position

Tech Risk and Controls Associate supports resiliency governance across CTC applications. You will partner with engineering teams to validate and continuously improve recovery strategies and architectures, contributing to design reviews and providing practical guidance on AWS and distributed systems resiliency. You’ll coordinate resiliency testing, curate evidence and metrics, and help ensure critical CTC business processes remain within impact tolerances during disruptions. This role blends hands-on technical governance with structured testing, architecture reviews, and data-driven insights.

Requirements

  • 2–4 years in technology resiliency, operational resilience, SRE/operations, technology risk/governance, or application engineering in cloud‑enabled environments.
  • Working knowledge of AWS and modern application architecture: Core resiliency constructs (Multi‑AZ basics, backup/restore, health checks, auto scaling, CloudWatch alarms).
  • Understanding of distributed systems basics and failure patterns (timeouts, retries, backoff, circuit breakers).
  • Experience coordinating tests/exercises and documenting results; ability to track remediation to completion.
  • Proficiency with Excel or similar for MI; able to interpret SLOs and RTO/RPO and communicate clear risk‑based updates.
  • Familiarity with Infrastructure as Code concepts (Terraform or CloudFormation) and automation guardrails; able to review for basic compliance/recovery readiness.
  • Strong communication, organization, and stakeholder coordination skills.

Nice To Haves

  • AWS Cloud Practitioner or higher (e.g., AWS Solutions Architect Associate); or comparable certifications.
  • Exposure to internal audit/regulatory reviews and evidence preparation.
  • Familiarity with SRE and chaos engineering concepts from a governance/support perspective.
  • Experience with JIRA and Confluence

Responsibilities

  • Resiliency governance and documentation Support the resiliency governance framework for CTC applications; align artifacts to firm policies and standards. Maintain and refresh business impact assessments, recovery strategies, plans, and runbooks; ensure version control and quality. Coordinate the annual test calendar (e.g., recovery strategy validation, failover, tabletop/MEPC) and assemble exam‑ready evidence.
  • Resiliency testing and validation Schedule, prepare, and track resiliency tests; document results, issues, and remediation actions. Support controlled chaos experiments in partnership with engineering using firm‑approved tooling; maintain clear guardrails and safe blast radius; capture outcomes and lessons learned.
  • Resiliency architecture and technical collaboration Participate in resilience design reviews; contribute questions and observations to strengthen recovery approaches. Apply working knowledge of AWS resiliency patterns (e.g., RDS Multi‑AZ, backups/PITR, S3 versioning, Route 53 health checks/failover, Auto Scaling; basic CloudWatch alarms). Understand common modern app patterns (microservices, REST/event‑driven) and failure handling (timeouts, retries, backoff, circuit breakers) to inform governance reviews.
  • Resiliency metrics and reporting Help publish/track SLOs and recovery objectives (RTO/RPO, MTTR) for critical services; maintain dashboards/MI. Summarize trends on test coverage, control effectiveness, and issue aging for leadership updates.
  • Issue and risk management Log and track issues, exceptions, and risk acceptances to durable closure; escalate when needed.
  • Stakeholder coordination and incident support Coordinate across technology, cybersecurity, architecture, risk/compliance, and audit teams. Support incident/crisis events with notes, communications, and post‑incident action tracking.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service