About The Position

Are you passionate about automation, cloud infrastructure, Kubernetes, and reliability engineering? As a Senior Production Engineer (SRE) at Legion, you will build and operate a secure, highly scalable, and cost-effective AWS/Kubernetes-based cloud platform. You will work across infrastructure automation, CI/CD pipelines, observability, and production reliability. Simply put, the SRE team ensures Legion’s platform is reliable, scalable, and continuously improving for our customers. This role includes participation in an on-call rotation.

Requirements

  • 5-8+ years of experience in SRE, DevOps, or SaaS production operations.
  • 3+ years of hands-on experience operating production workloads in AWS.
  • Strong experience with Terraform and infrastructure-as-code practices.
  • 3+ years of experience with containerized environments using Docker and Kubernetes (EKS preferred); familiarity with Helm.
  • Proficiency in Go or Python (or similar programming language).
  • Experience building and maintaining CI/CD systems (Git-based workflows, Argo, Jenkins or similar).
  • Strong Linux/Unix systems experience.
  • Bachelor’s degree in Computer Science or equivalent practical experience.

Nice To Haves

  • Experience with observability tools such as Datadog, CloudWatch, ELK stack, Prometheus, or similar.
  • Experience managing AWS RDS and/or Aurora MySQL including slow query analysis, replication, and upgrade operations.
  • Experience implementing SLIs/SLOs and reliability best practices.
  • Experience working effectively with remote, distributed teams.
  • Experience with supporting SOC 2 / ISO 27001 audits.
  • AWS certification preferred.

Responsibilities

  • Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
  • Leverage GenAI tools (e.g., Claude Code, Codex, or similar) to accelerate infrastructure development, automation, and auto-remediation of common production issues.
  • Build and maintain infrastructure-as-code using Terraform.
  • Develop automation and internal tooling using Go or Python.
  • Improve CI/CD pipelines to increase deployment safety and velocity.
  • Define and improve monitoring, alerting, and observability systems.
  • Respond to production incidents, conduct root cause analysis, and implement systemic improvements.
  • Develop and automate operational runbooks and remediation workflows.
  • Support production deployments, including during off-hours as needed.

Benefits

  • $0 monthly premium and other flexible medical, dental, and vision plans effective on the first day of employment
  • 401k plan
  • Discretionary Paid Time Off and Paid Holidays
  • Parental Leave
  • Equity
  • Monthly Wellness Reimbursement
  • Monthly Lunch on Legion
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service