Site Reliability Engineer, Cloud Infrastructure

QuizletSan Francisco, CA
11h$120,000 - $168,488Onsite

About The Position

We are looking for a Site Reliability Engineer (SRE) to join our infrastructure team and help build reliable, scalable, and efficient systems. As an SRE, you'll blend software engineering expertise with systems knowledge to improve uptime, enhance performance, and reduce operational toil. We’re happy to share that this is an onsite position in our San Francisco office. To help foster team collaboration, we require that employees be in the office a minimum of three days per week : Monday, Wednesday, and Thursday and as needed by your manager or the company. We believe that this working environment facilitates increased work efficiency, team partnership, and supports growth as an employee and organization.

Requirements

  • 2+ years of professional experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
  • Previous internship or professional experience writing code in a software development role (backend, full-stack, or similar)
  • Solid programming skills in languages such as Python, Go, PHP
  • Familiarity with CI/CD systems and infrastructure-as-code tools (e.g., Terraform, GitHub Actions)
  • Understanding of Linux systems, networking fundamentals, and cloud-native concepts

Nice To Haves

  • Exposure to Kubernetes, container orchestration, or service mesh technologies is a plus
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog) is helpful
  • A growth mindset with interest in continuous improvement, root cause analysis, and reducing operational burden.
  • Good communication and collaboration skills; comfortable working across teams

Responsibilities

  • Monitor and maintain the reliability and uptime of our systems and services through effective alerting, incident response, and resilient design patterns
  • Write automation scripts and tools for deployments, infrastructure management, and operational tasks to reduce manual effort (toil)
  • Contribute to observability improvements (metrics, logging, tracing) to enhance visibility into systems and applications
  • Work with product and engineering teams to ensure systems are designed with scalability and resilience in mind
  • Participate in post-incident reviews and implement action items to prevent recurrence
  • Support and optimize infrastructure (e.g., Kubernetes, cloud platforms like GCP/AWS)
  • Learn and apply SRE best practices, including SLOs, SLIs, and error budgets

Benefits

  • Salary transparency helps to mitigate unfair hiring practices when it comes to discrimination and pay gaps.
  • Total compensation for this role is market competitive, including a starting base salary of $120,000 - $168,488, depending on location and experience, as well as company stock options
  • Collaborate with your manager and team to create a healthy work-life balance
  • 20 vacation days that we expect you to take!
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service