Senior Machine Learning Operations Engineer

Garner HealthNew York City, NY
Hybrid

About The Position

Garner is seeking a Senior MLOps Engineer to join their Platform Engineering team. This role will report to the Platform Engineering Manager, Developer Experience. As an early member of Garner's MLOps function, the engineer will help build and operate the production machine learning systems that power Garner's products. This involves partnering closely with machine learning and data science teams to enable the secure and consistent deployment of models. Given that these models directly influence health outcomes and cost-effectiveness for millions of patients, maintaining the highest standards of production quality is imperative.

Requirements

  • 5+ years of software engineering experience, with meaningful time spent operating ML or data-intensive systems in production.
  • Hands-on experience with the modern ML production stack: model serving (e.g., Sagemaker, Triton, or equivalent), feature stores, model registries, and CI/CD for ML.
  • Strong infrastructure and platform engineering fundamentals: Kubernetes, containerization, cloud (AWS preferred), Terraform/IaC, observability, and incident response.
  • Experience building ML platforms or significant components of one (not strictly consuming SaaS), with sound judgment around when to build vs. buy.
  • Strong collaboration with ML, data, platform engineers, data scientists, and product engineering teams, with the ability to lead projects and influence technical decisions.

Nice To Haves

  • Healthcare, regulated-data, or other high-stakes production ML experience is a plus but not required.
  • A desire to be a part of a high-performing, mission-driven team that operates with intense urgency, a strong sense of individual accountability, and a commitment to authentic feedback

Responsibilities

  • Help ensure the reliability, performance, functionality, and cost-efficiency of Garner's production ML systems, contributing to SLOs, observability, and on-call responsibilities.
  • Build key components of Garner's ML platform, including data infrastructure (such as a feature store, model registry, and CI/CD for models) and standardized service patterns.
  • Implement ML-specific CI/CD pipelines: Help transition our deployment process from manual notebook hand-offs to automated, PR-driven CI/CD workflows that include automated data quality checks and statistical model validation prior to deployment.
  • Drive down cost and latency through improved architecture, hardware choices, and model optimization as appropriate.
  • Contribute to the workflows, standards, and KPIs that support a growing MLOps function, helping teammates and stakeholders quickly identify the health of the team's products and focus on areas where issues reside.
  • Help establish drift monitoring: Design and implement automated data drift and concept drift monitoring systems that alert the team when models degrade, laying the groundwork for future Continuous Training (CT) architectures.

Benefits

  • flexible PTO
  • Medical/Dental/Vision plan options
  • 401(k)
  • Teladoc Health
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service