Staff Site Reliability Engineer - Platform

QuizletSan Francisco, CA
13dOnsite

About The Position

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way. Our $1B+ learning platform serves tens of millions of students every month, including two-thirds of U.S. high schoolers and half of U.S. college students, powering over 2 billion learning interactions monthly. We blend cognitive science with machine learning to personalize and enhance the learning experience for students, professionals, and lifelong learners alike. We’re energized by the potential to power more learners through multiple approaches and various tools. Let’s Build the Future of Learning Join us to design and deliver AI-powered learning tools that scale across the world and unlock human potential. About the Role As a Staff Site Reliability Engineer , you’ll lead reliability engineering across Quizlet’s platform — designing automation, scaling systems, and ensuring that our infrastructure can support rapid innovation in AI-powered learning. You’ll drive the architectural direction for resilience, observability, and performance while mentoring other engineers and influencing platform-wide standards. We’re happy to share that this is an onsite position in our San Francisco office. To help foster team collaboration, we require that employees be in the office a minimum of three days per week : Monday, Wednesday, and Thursday and as needed by your manager or the company. We believe that this working environment facilitates increased work efficiency, team partnership, and supports growth as an employee and organization.

Requirements

  • 8+ years of experience in SRE, systems, or infrastructure engineering
  • Expertise in Kubernetes (GKE), Terraform, and CI/CD pipelines (ArgoCD, GitHub Actions, CircleCI)
  • Deep programming skills in Go and/or Python for infrastructure automation
  • Strong experience in Datadog, system monitoring, and distributed tracing
  • Familiarity with GCP services, Linux internals, and large-scale networking
  • Proven experience leading cross-team reliability initiatives and architectural improvements

Responsibilities

  • Lead the design and implementation of self-healing, auto-scaling infrastructure across our Kubernetes and Istio environments
  • Architect and implement CI/CD reliability improvements that reduce MTTR and deployment risk
  • Partner with teams to define and enforce SLOs and operational excellence standards
  • Build systems and tools that enable proactive reliability and capacity management
  • Drive incident analysis and postmortems using Datadog and Jeli to identify architectural improvements
  • Mentor engineers and establish best practices for automation, observability, and scaling

Benefits

  • Collaborate with your manager and team to create a healthy work-life balance
  • 20 vacation days that we expect you to take!
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401(k) plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service