Site Reliability Engineer

Latent•San Francisco, CA

139d•$165,000 - $250,000

About The Position

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operational excellence is directly tied to a patient's access to life-saving treatment.

Requirements

Deep, demonstrable experience with Kubernetes, Helm, and Terraform.
Proven ability to architect and maintain complex, distributed systems with high-availability requirements.
Hands-on experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML).
Experience with PostgreSQL, Redis, and Kafka.

Nice To Haves

Excitement about working five days per week in our San Francisco office.

Responsibilities

Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.
Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.
Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.
Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.
Manage and maintain infrastructure definitions using Terraform.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

Site Reliability Engineer

About The Position

Requirements

Nice To Haves

Responsibilities

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company