Staff Machine Learning Operations Engineer

Garner HealthNew York, NY
$298,000 - $351,000Hybrid

About The Position

Garner is seeking an exceptional Staff MLOps Engineer to join their Platform Engineering team. This role will report to the VP of Platform Engineering. As Garner's foundational dedicated MLOps Engineer, you will assume responsibility for the reliability, performance, and cost-efficiency of their production machine learning systems. You will lead the development of a robust platform designed to facilitate the secure and consistent deployment of models by their machine learning and data science teams. Given that these models directly influence health outcomes and cost-effectiveness for millions of patients, maintaining the highest standards of production quality is imperative.

Requirements

  • 7+ years of software engineering experience, with significant time spent operating ML or data-intensive systems in production at scale.
  • Deep experience with the modern ML production stack: model serving (e.g., Sagemaker, Triton, or equivalent), feature stores, model registries, and CI/CD for ML.
  • Strong infrastructure and platform engineering fundamentals: Kubernetes, containerization, cloud (AWS preferred), Terraform/IaC, observability, and incident response.
  • Experience designing ML platforms or significant components of one (not strictly consuming SaaS) and the judgment to know when to build vs. buy.
  • Strong collaboration with ML, data, platform engineers, data scientists, and product engineering teams, with the ability to set technical direction as the most senior MLOps voice in the org.

Nice To Haves

  • Healthcare, regulated-data, or other high-stakes production ML experience is a plus but not required.
  • A desire to be a part of a high-performing, mission-driven team that operates with intense urgency, a strong sense of individual accountability, and a commitment to authentic feedback.

Responsibilities

  • Own the reliability, performance, functionality, and cost-efficiency of Garner's production ML systems, including establishing SLOs, observability, and on-call responsibilities.
  • Architect Garner's ML platform including required data infrastructure (including feature store, model registry and CI/CD for models), and standardized service patterns.
  • Implement ML-specific CI/CD pipelines: Transition our deployment process from manual notebook hand-offs to automated, PR-driven CI/CD workflows that include automated data quality checks and statistical model validation prior to deployment.
  • Drive down cost and latency through improved architecture, hardware choices, and model optimization as appropriate.
  • Lay the foundation for a future Garner MLOps team, including workflows, standards, and KPIs that enables rapid teammate onboarding and helps stakeholders and teammates quickly identify the health of the team’s products, allowing engineers to focus on areas where issues reside.
  • Establish Drift Monitoring: Design and implement automated data drift and concept drift monitoring systems that alert the team when models degrade, laying the groundwork for future Continuous Training (CT) architectures.

Benefits

  • flexible PTO
  • Medical/Dental/Vision plan options
  • 401(k)
  • Teladoc Health
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service