Senior Site Reliability Engineer

CoderPad

19d•$170,000 - $180,000•Remote

About The Position

CoderPad's mission is to create a more inclusive and strong tech community. How? By improving the technical interviewing experience with tools that allow for standardization and consistency, while reducing bias and increasing equality of opportunity. We are looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering team. What does a Senior Site Reliability Engineer (SRE) at CoderPad do? As the global leader in the technical interview space (4,000+ customers in 165+ countries), CoderPad provides real-time, AI-augmented platforms that allow companies to evaluate developers on the latest technologies through live coding and collaborative environments. As a Senior SRE, you will ensure that the multi-cloud, Kubernetes-based platform powering these experiences remains reliable, scalable, secure, and cost-efficient. You will work at the intersection of platform engineering, reliability, and cloud infrastructure, enabling both a great customer experience and fast product delivery. This role reports to the Engineering Manager , with technical leadership from the Head of SRE, and is based in North America, working EST hours.

Requirements

5+ years of experience in SRE, DevOps, Platform Engineering, or Cloud Infrastructure roles.
Strong experience with AWS and GCP, including networking, IAM, compute, and managed services.
Hands-on experience running Kubernetes in production.
Strong knowledge of Terraform or equivalent infrastructure-as-code tooling.
Experience with CI/CD systems (e.g., GitLab CI or similar).Solid understanding of observability (metrics, logs, traces) using tools like Datadog, Prometheus, Grafana, or similar.
Proficiency in at least one programming or scripting language (Go, Python, Node.js, or similar).
Strong Linux and Bash skills.
Experience operating high-availability, customer-facing SaaS platforms.

Responsibilities

Design, operate, and evolve production infrastructure across AWS, GCP, Heroku, and Kubernetes.
Own and improve monitoring, alerting, and SLOs for customer-facing services.
Lead and participate in incident response, postmortems, and long-term remediation.
Build and maintain infrastructure-as-code, CI/CD pipelines, and automation (Terraform, GitLab CI, Kubernetes tooling).
Drive scalability, performance, and resilience across a real-time SaaS platform.
Ensure security, patching, and operational hygiene across all environments.
Partner with product and engineering teams to enable safe, fast, and reliable releases.
Actively contribute to cost visibility and cloud optimization.

Benefits

Meaningful work with high impact for a well-loved product
Competitive, market-rate salaries
Stock options with a 4-year vesting schedule
Medical, dental, and vision insurance (90% covered for employees and dependents)
Flexible Spending Account (FSA)
401K with profit sharing
Unlimited paid time off with an expectation of taking 3 weeks annually in addition to 20 company holidays
Remote-friendly environment with monthly WFH stipend
Parental leave (primary: 16 weeks; secondary: 12 weeks)
Short- and long-term disability and life insurance coverage
Choice of laptop computer
Internal mobility and growth opportunities
And more…