About The Position

Investment Banking is a technology‑centric business driven by real‑time processing, sophisticated integrated systems, and vast data access, making technology critical to business success. We are seeking a visionary and experienced Vice President, Lead Site Reliability Engineer (SRE) to join the Investment Banking Chief Technology Office (IB CTO) team in Cary, US, where this role will be instrumental in shaping the strategic direction and execution of SRE across critical applications and platforms. As a senior leader, you will elevate the overall reliability posture, drive architectural resilience, and champion the adoption of cloud‑native patterns across a diverse application portfolio. You will translate complex technical challenges into actionable SRE roadmaps and execute them across multiple global teams and technology stacks. The role also focuses on proactively mitigating systemic risk, optimizing cost efficiency through SRE principles, and providing technical thought leadership for highly complex, distributed systems underpinning core Investment Banking functions.

Requirements

  • Deep mastery of SRE practices (SLOs/SLIs, error budgets, incident management) with a proven ability to drive SRE adoption and cultural change across teams
  • Expert in designing and optimizing large‑scale GCP platforms (GKE, IAM, networking, security, data services), with multi‑cloud or hybrid experience a plus
  • Hands‑on leadership operating large, production Kubernetes environments, including service mesh and shared platform capabilities
  • Extensive experience leading Terraform‑based IaC, GitOps deployments (ArgoCD / FluxCD), and modern CI/CD‑driven SDLC transformations
  • Advanced observability and AIOps expertise in monitoring, alerting and logging strategies, backed by strong programming skills (e.g. Python, Go, Java) to deliver scalable automation and shared tooling
  • Deep expertise in diagnosing and resolving complex, production‑critical issues through rigorous root‑cause analysis across diverse application domains
  • Proven leader with the ability to influence, align cross‑functional teams, and clearly communicate complex technical topics to both technical and senior business stakeholders

Nice To Haves

  • Experience in highly regulated environments, ideally financial services, with strong understanding of compliance and security requirements for critical infrastructure
  • Excellent communicator who bridges technical risk and business impact for non‑technical stakeholders, and actively mentor engineers to promote scalable knowledge‑sharing

Responsibilities

  • Lead the platform reliability, performance, and scalability strategy across highly complex, distributed systems on GCP and on‑prem, providing architectural guidance to ensure resilience and fault tolerance across IB CTO applications
  • Define and institutionalize SRE operational excellence, including advanced incident management, blameless post‑mortems, and proactive problem‑prevention practices across engineering teams
  • Drive automation and tooling innovation to reduce toil across multiple applications, leveraging advanced automation, self‑healing capabilities, and operational intelligence, while mentoring engineers on sustainable solutions
  • Establish and drive enterprise‑wide adoption of SLIs/SLOs for mission‑critical services, aligning reliability metrics with business objectives and communicating outcomes to senior leadership and stakeholders
  • Act as a trusted technical advisor across application teams, leading cross‑functional initiatives to improve system stability, reliability culture, and complex troubleshooting
  • Provide architectural stewardship across Infrastructure as Code (IaC), capacity planning, and operational documentation, ensuring scalability, cost efficiency, security, disaster recovery, and knowledge sharing across the portfolio
  • Partner closely with application development leads and platform engineering teams to embed SRE principles into system design and delivery
  • Act as a trusted technical advisor to senior technology and business stakeholders on reliability, risk, and operational resilience topics
  • Foster a shared culture of reliability, learning, and continuous improvement across geographically distributed teams

Benefits

  • A hybrid working model, allowing for in-office / work from home flexibility
  • Generous vacation, personal and volunteer days
  • Employee Resource Groups support an inclusive workplace for everyone and promote community engagement
  • Competitive compensation packages
  • Health and wellbeing benefits
  • Retirement savings plans
  • Parental leave
  • Family building benefits
  • Educational resources
  • Matching gift and volunteer programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service