Staff Site Reliability Engineer

TwelveLabsSan Francisco, CA
Remote

About The Position

As a Staff Site Reliability Engineer at Twelve Labs, you will own the reliability, scalability, and operability of the infrastructure that powers our multimodal foundation models. You'll be hands-on — building systems when needed, but with a primary focus on ensuring production stays healthy, observable, and resilient. You'll work most closely with the product teams in the US, supporting the infrastructure behind our core AI products. This role requires deep operational instincts, strong debugging skills, and the ability to balance long-term reliability investments against the pace of an early-stage AI company.

Requirements

  • 7+ years of experience operating production infrastructure systems, not just building them.
  • Strong hands-on experience with AWS, Kubernetes in production environments.
  • Solid fundamentals in OS internals, networking, storage, and compute — the kind that help you debug a problem at 3am without documentation.
  • Deep practical experience with observability (Prometheus/Grafana/Loki or equivalent), Infrastructure as Code (Terraform, Ansible), and CI/CD.
  • Track record of owning services end to end — deployment, monitoring, incident response, and postmortem follow-through.

Responsibilities

  • Own production reliability end to end — from deployment through monitoring, incident response, and postmortem-driven improvement.
  • Partner with the product engineering teams to ensure their services are reliable, observable, and operable by design.
  • Build and maintain observability systems (metrics, logging, tracing, alerting) that give the team clear signal on system health and performance.
  • Design and operate cloud infrastructure supporting AI/ML workloads.
  • Drive incident response — detect, diagnose, mitigate, and prevent production issues. Build the runbooks, automation, and guardrails that reduce mean time to recovery.
  • Identify and eliminate toil through automation, self-healing systems, and better tooling.

Benefits

  • Full health, dental, and vision benefits
  • Extremely flexible PTO and parental leave policy
  • Monthly wellness stipend
  • Annual Learning & Development stipend to invest in your growth
  • Transportation stipend
  • Daily lunch & dinner provided
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service