Senior Site Reliability Engineer (SRE)

Voya FinancialAtlanta, GA
Hybrid

About The Position

We’re seeking a seasoned Site Reliability Engineer (SRE) who thrives at the intersection of software engineering, infrastructure, and AI systems. You’ll help ensure our platforms are scalable, reliable, and secure—while also contributing code, automation, and architectural improvements that support both traditional services and AI-driven workloads. This role is ideal for someone who thinks like a developer, understands AI infrastructure, and is passionate about reliability, observability, and operational excellence.

Requirements

  • 5+ years of experience in SRE, DevOps, or software engineering roles.
  • Strong programming skills in languages such as Python, Java, etc.
  • Experience supporting AI/ML workloads (e.g., model training, inference, GPU orchestration).
  • Deep understanding of Linux systems, cloud platforms (Primarily Azure, AWS), and container orchestration.
  • Experience with infrastructure-as-code tools (Terraform, Ansible, GitHub, etc.).
  • Proficiency in monitoring and logging tools (Dynatrace, etc.).
  • Solid grasp of networking, security, and distributed systems.
  • Excellent communication and collaboration skills.

Nice To Haves

  • Experience with AI model observability, drift detection, or performance monitoring.
  • Contributions to open-source SRE, DevOps, or ML infrastructure tools.
  • Certifications in cloud platforms.

Responsibilities

  • Design, build, and maintain scalable infrastructure and automation tools for both traditional and AI-based systems.
  • Develop software solutions to improve system reliability and reduce manual toil.
  • Implement and manage CI/CD pipelines, including model deployment workflows.
  • Monitor system performance, availability, and security using modern observability tools.
  • Collaborate with data science and ML engineering teams to support AI/ML model training, serving, and lifecycle management.
  • Lead incident response, root cause analysis, and postmortem processes.
  • Advocate for SRE principles across engineering and AI teams.

Benefits

  • Health, dental, vision and life insurance plans
  • 401(k) Savings plan – with generous company matching contributions (up to 6%)
  • Voya Retirement Plan – employer paid cash balance retirement plan (4%)
  • Tuition reimbursement up to $5,250/year
  • Paid time off – including 20 days paid time off, nine paid company holidays and a flexible Diversity Celebration Day.
  • Paid volunteer time — 40 hours per calendar year
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service