Site Reliability Engineer

LatentSan Francisco, CA
86d$165,000 - $250,000

About The Position

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operational excellence is directly tied to a patient's access to life-saving treatment.

Requirements

  • Deep, demonstrable experience with Kubernetes, Helm, and Terraform.
  • Proven ability to architect and maintain complex, distributed systems with high-availability requirements.
  • Hands-on experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML).
  • Experience with PostgreSQL, Redis, and Kafka.

Nice To Haves

  • Excitement about working five days per week in our San Francisco office.

Responsibilities

  • Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.
  • Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.
  • Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.
  • Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.
  • Manage and maintain infrastructure definitions using Terraform.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service