Senior Site Reliability Engineer

CoderPad
10d$170,000 - $180,000Remote

About The Position

CoderPad's mission is to create a more inclusive and strong tech community. How? By improving the technical interviewing experience with tools that allow for standardization and consistency, while reducing bias and increasing equality of opportunity. We are looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering team. What does a Senior Site Reliability Engineer (SRE) at CoderPad do? As the global leader in the technical interview space (4,000+ customers in 165+ countries), CoderPad provides real-time, AI-augmented platforms that allow companies to evaluate developers on the latest technologies through live coding and collaborative environments. As a Senior SRE, you will ensure that the multi-cloud, Kubernetes-based platform powering these experiences remains reliable, scalable, secure, and cost-efficient. You will work at the intersection of platform engineering, reliability, and cloud infrastructure, enabling both a great customer experience and fast product delivery. This role reports to the Engineering Manager , with technical leadership from the Head of SRE, and is based in North America, working EST hours.

Requirements

  • 5+ years of experience in SRE, DevOps, Platform Engineering, or Cloud Infrastructure roles.
  • Strong experience with AWS and GCP, including networking, IAM, compute, and managed services.
  • Hands-on experience running Kubernetes in production.
  • Strong knowledge of Terraform or equivalent infrastructure-as-code tooling.
  • Experience with CI/CD systems (e.g., GitLab CI or similar).Solid understanding of observability (metrics, logs, traces) using tools like Datadog, Prometheus, Grafana, or similar.
  • Proficiency in at least one programming or scripting language (Go, Python, Node.js, or similar).
  • Strong Linux and Bash skills.
  • Experience operating high-availability, customer-facing SaaS platforms.

Responsibilities

  • Design, operate, and evolve production infrastructure across AWS, GCP, Heroku, and Kubernetes.
  • Own and improve monitoring, alerting, and SLOs for customer-facing services.
  • Lead and participate in incident response, postmortems, and long-term remediation.
  • Build and maintain infrastructure-as-code, CI/CD pipelines, and automation (Terraform, GitLab CI, Kubernetes tooling).
  • Drive scalability, performance, and resilience across a real-time SaaS platform.
  • Ensure security, patching, and operational hygiene across all environments.
  • Partner with product and engineering teams to enable safe, fast, and reliable releases.
  • Actively contribute to cost visibility and cloud optimization.

Benefits

  • Meaningful work with high impact for a well-loved product
  • Competitive, market-rate salaries
  • Stock options with a 4-year vesting schedule
  • Medical, dental, and vision insurance (90% covered for employees and dependents)
  • Flexible Spending Account (FSA)
  • 401K with profit sharing
  • Unlimited paid time off with an expectation of taking 3 weeks annually in addition to 20 company holidays
  • Remote-friendly environment with monthly WFH stipend
  • Parental leave (primary: 16 weeks; secondary: 12 weeks)
  • Short- and long-term disability and life insurance coverage
  • Choice of laptop computer
  • Internal mobility and growth opportunities
  • And more…
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service