Sr Site Reliability Engineer

Renaissance Learning North America

1d•$109,500 - $150,550•Remote

About The Position

Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group’s Site Reliability Team with a focus on Application and Infrastructure Availability, Reliability, Observability & Security. We are at the crossroads of evolving our current team and looking for someone who has been involved in the SRE implementation journey at other companies. We are looking for someone who influences our SRE philosophy and practices, who is a problem solver, self-motivated, great at communication, values teamwork. You will apply your technical expertise to build and scale our highly available distributed SaaS platform used by millions of K-12 students worldwide.

Requirements

5+ years of experience focused on SRE.
Experience in managing & monitoring containerized cloud environments in production, preferably AWS EKS.
Experience with IaC, Configuration Management and Orchestration Tools like Terraform/Docker/Ansible.
Hands-on experience in any of the programming or scripting languages like .NET/Java, Python, Javascript etc.
On Call experience & willingness to be on call during non-work hours and weekends.
Experience working in an agile environment.
Applicants must be authorized to work for any employer in the United States. We are unable to sponsor or take over sponsorship of an employment Visa at this time.

Nice To Haves

BS in Information Systems or Computer Science, related field experience, or both.
Managing Kubernetes Clusters, EKS at Scale using Helm.
Setting up Gitlab & Github pipelines & workflows.
Experience setting up Monitoring, Logging, Alerting & Observability in tools such as NewRelic, Datadog, Grafana. CloudWatch, PagerDuty.
Experience w/Teleport, Hashicorp Boundary etc.
Experience w/RedShift, OpenSearch/ZeroETL.
Experience running Disaster Recovery exercises.
Implementing service level objectives (SLO/SLI/SLA’s) & error budgets.
Experience using ClaudeCode using agentic coding, agentic SDLC, enabling/rolling-out agentic DX.

Responsibilities

Work with engineering, security & governance teams to improve observability, reliability, resiliency, auditability of our systems and minimize/prevent downtime.
Contribute to infrastructure-as-code using Terraform & CloudFormation.
Support CI/CD pipelines which ensures the prompt release of high-quality software.
Collaborate with cross-functional teams to resolve infrastructure issues.
Perform Disaster Recovery exercises on our products.
Explore and integrate AI tooling into the SRE workflows.
Be part of an on-call rotation & support off hour incidents & deployments.
Demonstrates strong skills in giving constructive feedback through coaching even without direct reports.