Senior Machine Learning Engineer

ExaCare AINew York, NY
Hybrid

About The Position

We are looking for a Senior Machine Learning Engineer, MLOps to help operationalize and scale our machine learning systems. This is an engineering-focused role centered on building the workflows, infrastructure, and processes that enable ML to move from research into reliable production systems. You will partner closely with research-oriented ML teammates and help turn their work into scalable, maintainable, and cost-effective production systems. This includes building and improving data pipelines, training pipelines, deployment workflows, monitoring systems, and supporting infrastructure that allow the team to move faster and operate ML systems with confidence. This is not a research-first role. It is best suited for someone who is excited by the systems, tooling, and operational side of machine learning.

Requirements

  • Several years of experience in machine learning engineering, MLOps, ML infrastructure, data engineering, or backend/platform engineering in ML environments
  • Experience supporting ML systems end to end, from model handoff through deployment and monitoring
  • Strong experience building and owning data pipelines, training pipelines, or other production workflows that support ML
  • Experience working closely with researchers, data scientists, or ML practitioners to productionize models
  • Strong software engineering fundamentals and experience building production systems
  • Experience with monitoring, debugging, and improving production ML or data systems
  • A track record of improving reliability, scalability, speed, and/or cost efficiency in ML systems
  • Comfort operating in a fast-moving, startup-style environment with a high degree of ownership

Responsibilities

  • Build and maintain the workflows and infrastructure that support the end-to-end ML lifecycle
  • Partner with researchers and ML practitioners to productionize models and enable faster iteration
  • Design, build, and improve data pipelines and training pipelines
  • Improve data processing, annotation workflows, and ML system efficiency
  • Deploy and maintain the background systems that support model training and inference
  • Build tooling and processes for monitoring model performance, system reliability, and operational health
  • Improve the scalability, observability, and reproducibility of ML systems
  • Optimize ML infrastructure for speed, reliability, and cost-efficiency
  • Identify bottlenecks in the ML workflow and automate or streamline manual processes
  • Help establish best practices around ML operations, deployment, and system performance

Benefits

  • Competitive salary and equity in a high-growth startup
  • Flexible PTO, take what you need
  • Medical, dental, and vision coverage
  • Great startup culture, including company off-sites
  • High-achieving team, including ex-Amazon engineers and alumni of Bain, BCG, Goldman Sachs, and more
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service