Site Reliability Engineer, Cloud Infrastructure

Quizlet•San Francisco, CA

29d•$120,000 - $168,488•Onsite

About The Position

We are looking for a Site Reliability Engineer (SRE) to join our infrastructure team and help build reliable, scalable, and efficient systems. As an SRE, you'll blend software engineering expertise with systems knowledge to improve uptime, enhance performance, and reduce operational toil. We’re happy to share that this is an onsite position in our San Francisco office. To help foster team collaboration, we require that employees be in the office a minimum of three days per week : Monday, Wednesday, and Thursday and as needed by your manager or the company. We believe that this working environment facilitates increased work efficiency, team partnership, and supports growth as an employee and organization.

Requirements

2+ years of professional experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
Previous internship or professional experience writing code in a software development role (backend, full-stack, or similar)
Solid programming skills in languages such as Python, Go, PHP
Familiarity with CI/CD systems and infrastructure-as-code tools (e.g., Terraform, GitHub Actions)
Understanding of Linux systems, networking fundamentals, and cloud-native concepts

Nice To Haves

Exposure to Kubernetes, container orchestration, or service mesh technologies is a plus
Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog) is helpful
A growth mindset with interest in continuous improvement, root cause analysis, and reducing operational burden.
Good communication and collaboration skills; comfortable working across teams

Responsibilities

Monitor and maintain the reliability and uptime of our systems and services through effective alerting, incident response, and resilient design patterns
Write automation scripts and tools for deployments, infrastructure management, and operational tasks to reduce manual effort (toil)
Contribute to observability improvements (metrics, logging, tracing) to enhance visibility into systems and applications
Work with product and engineering teams to ensure systems are designed with scalability and resilience in mind
Participate in post-incident reviews and implement action items to prevent recurrence
Support and optimize infrastructure (e.g., Kubernetes, cloud platforms like GCP/AWS)
Learn and apply SRE best practices, including SLOs, SLIs, and error budgets

Benefits

Salary transparency helps to mitigate unfair hiring practices when it comes to discrimination and pay gaps.
Total compensation for this role is market competitive, including a starting base salary of $120,000 - $168,488, depending on location and experience, as well as company stock options
Collaborate with your manager and team to create a healthy work-life balance
20 vacation days that we expect you to take!
Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
Employer-sponsored 401k plan with company match
Access to LinkedIn Learning and other resources to support professional growth
Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
40 hours of annual paid time off to participate in volunteer programs of choice

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume