Site Reliability Engineer

Coterie
$120,000 - $155,000Remote

About The Position

We're looking for a Site Reliability Engineer who's passionate about building and maintaining reliable, scalable infrastructure and who thrives on making systems better every day. In this role, you'll join our SRE team to help keep our platforms running smoothly, improve our observability and incident response capabilities, and partner with development teams to deliver infrastructure that supports high-quality, reliable software. You'll play a key role in managing our cloud infrastructure, strengthening our CI/CD pipelines, and helping us get the most out of our monitoring and alerting tools, particularly Grafana. This is a great opportunity for a mid-level engineer ready to take ownership of meaningful infrastructure challenges.

Requirements

  • 3+ years of experience in a Site Reliability Engineering, DevOps, or Infrastructure role
  • Strong hands-on experience with:
  • Azure Cloud services and resource management
  • Kubernetes and AKS administration, including deployments, networking, and troubleshooting
  • GitHub Actions for CI/CD pipeline development and maintenance
  • Solid experience with Grafana, including dashboard creation, alerting configuration, and incident management
  • Hands-on experience with Prometheus, Loki, or other observability tools in the Grafana ecosystem
  • Proficiency in at least one scripting or programming language such as Python or Bash
  • Understanding of networking fundamentals, DNS, load balancing, and container orchestration concepts
  • Strong analytical and communication skills; able to diagnose complex system issues and clearly communicate findings
  • Demonstrated ability to collaborate across teams and contribute to a culture of reliability
  • Experience working in an agile environment with modern DevOps practices

Nice To Haves

  • Experience working at a startup or in a fast-paced, cross-functional environment
  • Familiarity with the insurance industry or other regulated sectors
  • Experience with infrastructure-as-code tools such as Terraform or Pulumi
  • Familiarity with service mesh technologies (e.g., Istio)

Responsibilities

  • Manage and maintain cloud infrastructure on Azure, including Azure Kubernetes Service (AKS) clusters and supporting resources
  • Build, improve, and maintain CI/CD pipelines using GitHub Actions to support reliable and repeatable deployments
  • Own and enhance our Grafana implementation; designing dashboards, configuring alerts, and supporting incident management workflows
  • Monitor system health, triage incidents, and drive root cause analysis to prevent recurrence
  • Collaborate with development teams to define and track SLIs, SLOs, and error budgets that align with business goals
  • Contribute to infrastructure-as-code practices using Pulumi
  • Identify and resolve reliability risks through capacity planning, performance tuning, and proactive system improvements
  • Participate in an on-call rotation to support production systems and respond to incidents
  • Document runbooks, operational procedures, and architectural decisions to support team knowledge sharing

Benefits

  • 100% remote
  • Health insurance through Aetna (we pay 100% of premiums)
  • Dental and vision insurance through Guardian (we pay 100% of premiums)
  • Basic life insurance (we pay 100% of premiums)
  • Access to flexible spending account (FSA) or health savings account (HSA) (for those using HSA eligible plans)
  • 401K plan (up 4% match with immediate vest). Must be 21 years of age or older to participate
  • Flexible PTO policy offering up to 3 weeks of time off to support onboarding and integration during the first twelve months of employment. After the first year of employment and effective as of the anniversary date, eligibility transitions to up to 4 to 5 weeks of time off annually to recharge and sustain long-term success.
  • 12 company-paid holidays each year
  • Continuing education annual stipend
  • Annual salary estimated between $120,000-$155,000 based on national data. Candidates who meet all the minimum requirements and possess additional relevant experience, as outlined in the job description, may be considered for a salary above the midpoint of the above range. Salary is based on internal equity; internal salary ranges; market data/ranges; applicant’s skills; prior relevant experience; degrees or certifications, etc.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service