Senior Site Reliability Engineer

Coalition, Inc.
2d$135,900 - $215,000Remote

About The Position

We are looking for a Senior Site Reliability Engineer to join our Platform SRE team. In this role, you will build and operate the infrastructure, tools, and "paved roads" that empower our developers to deliver scalable, secure, and reliable software with speed and confidence. You'll work across the entire stack—from infrastructure automation and observability to developer enablement and system reliability. You will be a key collaborator with software engineering and security teams, helping to evolve our Infrastructure as Code (IaC), enhance CI/CD pipelines, and scale our internal developer platform. We value pragmatism and engineering excellence, primarily using Python, Go, and AWS to reduce toil and build self-service capabilities.

Requirements

  • 6+ years of experience in SRE, DevOps, Cloud Engineering, or Software Development roles
  • Hands-on experience operating production environments in AWS
  • Proficiency in Go or Python, with experience building production-grade automation, tooling or libraries
  • Strong experience with Terraform
  • Experience with container orchestration platforms like ECS or Kubernetes
  • Familiarity with CI/CD tools such as GitHub Actions
  • Experience designing and implementing re-usable platform components based on team requirements
  • Solid understanding of observability practices including system metrics, distributed tracing, and SLOs
  • Exposure to failure-based testing approaches and automated recovery strategies
  • Strong leadership and communication skills, both written and verbal
  • Experience evangelizing reliability best practices

Nice To Haves

  • Experience with microservices architectures
  • Exposure to Kafka or other event streaming systems
  • Experience building internal developer platforms or self-service infrastructure
  • Familiarity with systems security, compliance requirements, or hardening practices

Responsibilities

  • Infrastructure Automation: Design, build, and scale production environments using AWS and Terraform, driving architectural decisions that improve long-term maintainability and reliability.
  • System Reliability: Lead efforts to improve platform resilience through failure-based testing, automated recovery strategies, and proactive capacity planning.
  • Developer Enablement: Own the design and delivery of reusable platform components and self-service tools that streamline the developer experience and reduce cross-team toil.
  • Observability: Define and evolve observability standards across the platform, including system metrics, distributed tracing, and SLO frameworks.
  • Project Ownership: Own projects end to end—from initial scoping and effort estimation through detailed planning, execution, and successful rollout.
  • Mentorship & Standards: Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization.
  • Collaboration: Engage in technical design discussions, providing guidance and feedback while adapting strategies based on team input and evolving requirements.

Benefits

  • 100% medical, dental, and vision coverage
  • Flexible PTO
  • Annual home office stipend and WeWork access
  • Mental & physical health wellness programs like Headspace, Lumino, and more!
  • Competitive compensation and opportunity for advancement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service