Senior Site Reliability Engineer

Prodigy EducationToronto, ON
Remote

About The Position

As a Senior Site Reliability Engineer at Prodigy, you will join a high-leverage Infrastructure team that owns our cloud platform end-to-end. You aren't just "using" tools, you are building the foundation that allows millions of students to access adaptive learning. You will manage Kubernetes clusters, GitOps pipelines, and Terraform-defined AWS environments while creating the internal tooling that empowers our entire engineering organization.

Requirements

  • 5+ years in SRE, Platform, or Infrastructure roles running production systems at scale.
  • Deep understanding of K8s internals, debugging complex failures, and managing manifests via Helm or Kustomize.
  • Advanced proficiency in AWS (IAM, Networking, EKS) and writing reusable, modular Terraform.
  • Ability to write clean, maintainable code in Go or Python to build production-grade internal tooling.
  • High bar for written clarity, essential for postmortems and documentation in our remote-friendly environment.

Nice To Haves

  • Experience with GitOps workflows using ArgoCD.
  • Hands-on experience profiling or optimizing Node.js/TypeScript services.
  • Knowledge of Service Mesh architectures or Kubernetes Gateway API.
  • Background in EdTech or high-concurrency consumer platforms.

Responsibilities

  • Own and modernize significant systems across EKS, ArgoCD, and AWS to ensure the platform scales with our growing student base.
  • Write and maintain high-quality Terraform and Helm code that serves as the standard for other product teams.
  • Build and maintain Go/Python-based CLIs and automation that simplify the developer experience for every engineer at Prodigy.
  • Participate in on-call rotations and lead incident responses, turning production "fires" into permanent architectural improvements.
  • Optimize Datadog instrumentation and profile Node.js workloads to find and fix performance bottlenecks before they impact users.

Benefits

  • Inspirational mission and rewarding work
  • Total Rewards Program reflecting commitment to financial, physical, and mental well-being
  • Modern Stack: Work with cutting-edge tools like ArgoCD, Kubernetes Gateway API, and Drata for compliance automation.
  • Culture of Learning: We prioritize "correct over quick," focusing on postmortems that lead to real improvements rather than finger-pointing.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service