About The Position

Voya is seeking a Director of Site Reliability Engineering (SRE) & Testing Practices to join our Digital & Wealth Management Technology organization. This role reports to the VP of Platform Engineering and plays a critical leadership role in ensuring the reliability, quality, and resiliency of Voya’s participant, advisor, and plan sponsor digital platforms. This Director is accountable for two enterprise‑level, horizontal practices across all Digital & Wealth Management squads: Site Reliability Engineering (SRE): How reliably Voya’s platforms operate in production. Testing & Quality Engineering: How quality is defined, measured, and enforced before software reaches production. These responsibilities are intentionally combined. At Voya, quality and reliability are not separate handoffs — they are a continuous feedback loop. Testing establishes the quality bar that prevents incidents. SRE measures what escapes and uses those insights to raise the bar further. A third dimension makes this role unique: AI‑augmented engineering. AI is not a future aspiration here — it is a core lever to improve reliability and quality faster than scale alone. This role leads the practical adoption of AI tooling across SRE and testing workflows and ensures those tools produce measurable outcomes, not just experimentation. This is a Director‑level leadership role with meaningful technical expectation. You will lead a team of senior SRE and Quality Engineering leaders, embed practices across squads through influence, and partner closely with senior technology leadership to represent platform health at an executive level.

Requirements

  • Bachelor’s degree in Computer Science, Software Engineering, or related field required
  • 12+ years of progressive engineering experience
  • 4+ years leading SRE, quality engineering, or platform reliability functions at a Director level or equivalent
  • Demonstrated experience owning reliability and testing practices across multiple squads or portfolios
  • Experience in financial services, fintech, or other highly regulated, customer‑facing environments
  • Proven track record of improving reliability, quality, and operational outcomes at scale
  • Deep expertise in SRE practices: SLOs/SLIs, incident management, observability, chaos engineering, and capacity planning
  • Strong testing & quality engineering background, including contract testing, performance testing, accessibility, and security testing
  • Hands‑on experience with AI‑assisted engineering tools and applying AI meaningfully in testing or reliability workflows
  • Proficiency in TypeScript; working knowledge of Rust; familiarity with distributed systems
  • Understanding of reliability expectations in retirement, wealth management, or regulated financial platforms

Nice To Haves

  • Master’s degree preferred (or equivalent professional experience)
  • SRE or DevOps certifications (Google, AWS, or equivalent)
  • Experience building or extending AI‑powered observability or testing tooling
  • Open‑source contributions or conference presentations (SREcon, QeCon, KubeCon, re:Invent)
  • Experience with model risk management frameworks and AI governance in regulated environments

Responsibilities

  • Own Site Reliability Engineering Across Digital & Wealth Management: Define and govern SRE standards across all participant‑ and advisor‑facing services, including SLOs/SLIs, error budgets, alerting quality, and reliability governance. Own incident management and response standards, including on‑call health, severity classification, escalation paths, post‑incident reviews, and action‑item follow‑through. Lead observability strategy and tooling across the portfolio (logging, tracing, metrics, alerting), including monitoring for AI‑enabled features. Direct chaos engineering and resilience testing programs to surface risk before it impacts participants. Lead capacity planning for participant enrollment peaks, open enrollment periods, and market‑driven traffic surges. Own and report reliability health metrics for senior leadership, connecting technical outcomes to participant experience and risk posture.
  • Own Testing Strategy & Quality Engineering: Define and enforce the end‑to‑end testing strategy across squads, covering unit, integration, contract, end‑to‑end, performance, accessibility, and security testing. Govern consumer‑driven contract testing across service boundaries and ensure visibility of contract health across teams. Own performance, load, and scalability testing for critical participant and advisor journeys. Establish accessibility (WCAG 2.2 AA) as a non‑negotiable quality standard. Own security testing integration within CI/CD pipelines as blocking quality gates. Define test data strategies using compliant, production‑representative synthetic data for retirement domain scenarios. Report quality health metrics alongside reliability metrics in executive reviews.
  • Lead AI‑Augmented Engineering Practices: Own the AI tooling roadmap for SRE and quality engineering, with adoption and ROI accountability. Lead deployment and governance of AI coding assistants and AI‑powered testing tools within financial services constraints. Embed AI‑powered quality gates, test analysis, and incident insights into CI/CD and operational workflows. Define reliability and observability standards for AI‑enabled, participant‑facing features. Partner with AI innovation, risk, and compliance teams to ensure AI features meet fiduciary and regulatory expectations. Evangelize AI‑augmented engineering through demos, enablement sessions, and measurable success stories.
  • Production Readiness, Governance & Risk: Own production readiness standards for all new services and significant changes. Define enhanced reliability standards for AI‑powered features with fiduciary implications. Govern change risk assessment, protected deployment windows, feature flagging, and progressive rollouts. Partner with Compliance and Risk on technology risk identification and mitigation.
  • Tooling, Developer Experience & Practice Leadership: Own the SRE, testing, and observability toolchain across Digital & Wealth Management. Define CI/CD quality gates and environment strategies in partnership with platform engineering. Maintain internal knowledge bases, runbooks, and postmortem repositories. Lead, hire, and develop a high‑caliber team of senior SRE and Quality Engineering leaders. Build a thriving cross‑squad community of practice that raises standards through influence. Partner with platform leadership on quarterly engineering health reporting.

Benefits

  • Health, dental, vision and life insurance plans
  • 401(k) Savings plan – with generous company matching contributions (up to 6%)
  • Voya Retirement Plan – employer paid cash balance retirement plan (4%)
  • Tuition reimbursement up to $5,250/year
  • Paid time off – including 20 days paid time off, nine paid company holidays and a flexible Diversity Celebration Day.
  • Paid volunteer time — 40 hours per calendar year
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service