Site Reliability Engineer, Cloud Infrastructure

QuizletSan Francisco, CA
21h$120,000 - $168,488Onsite

About The Position

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way. Our $1B+ learning platform serves tens of millions of students every month, including two-thirds of U.S. high schoolers and half of U.S. college students, powering over 2 billion learning interactions monthly. We blend cognitive science with machine learning to personalize and enhance the learning experience for students, professionals, and lifelong learners alike. We’re energized by the potential to power more learners through multiple approaches and various tools. Let’s Build the Future of Learning Join us to design and deliver AI-powered learning tools that scale across the world and unlock human potential. About the Role: We are looking for a Site Reliability Engineer (SRE) to join our infrastructure team and help build reliable, scalable, and efficient systems. As an SRE, you'll blend software engineering expertise with systems knowledge to improve uptime, enhance performance, and reduce operational toil. We’re happy to share that this is an onsite position in our San Francisco office. To help foster team collaboration, we require that employees be in the office a minimum of three days per week: Monday, Wednesday, and Thursday and as needed by your manager or the company. We believe that this working environment facilitates increased work efficiency, team partnership, and supports growth as an employee and organization.

Requirements

  • 2+ years of professional experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
  • Previous internship or professional experience writing code in a software development role (backend, full-stack, or similar)
  • Solid programming skills in languages such as Python, Go, PHP
  • Familiarity with CI/CD systems and infrastructure-as-code tools (e.g., Terraform, GitHub Actions)
  • Understanding of Linux systems, networking fundamentals, and cloud-native concepts
  • A growth mindset with interest in continuous improvement, root cause analysis, and reducing operational burden.
  • Good communication and collaboration skills; comfortable working across teams

Nice To Haves

  • Exposure to Kubernetes, container orchestration, or service mesh technologies is a plus
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog) is helpful

Responsibilities

  • Monitor and maintain the reliability and uptime of our systems and services through effective alerting, incident response, and resilient design patterns
  • Write automation scripts and tools for deployments, infrastructure management, and operational tasks to reduce manual effort (toil)
  • Contribute to observability improvements (metrics, logging, tracing) to enhance visibility into systems and applications
  • Work with product and engineering teams to ensure systems are designed with scalability and resilience in mind
  • Participate in post-incident reviews and implement action items to prevent recurrence
  • Support and optimize infrastructure (e.g., Kubernetes, cloud platforms like GCP/AWS)
  • Learn and apply SRE best practices, including SLOs, SLIs, and error budgets

Benefits

  • Collaborate with your manager and team to create a healthy work-life balance
  • 20 vacation days that we expect you to take!
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service