Senior Machine Learning Engineer

ExaCare AI•New York, NY

49d•Hybrid

About The Position

We are looking for a Senior Machine Learning Engineer, MLOps to help operationalize and scale our machine learning systems. This is an engineering-focused role centered on building the workflows, infrastructure, and processes that enable ML to move from research into reliable production systems. You will partner closely with research-oriented ML teammates and help turn their work into scalable, maintainable, and cost-effective production systems. This includes building and improving data pipelines, training pipelines, deployment workflows, monitoring systems, and supporting infrastructure that allow the team to move faster and operate ML systems with confidence. This is not a research-first role. It is best suited for someone who is excited by the systems, tooling, and operational side of machine learning.

Requirements

Several years of experience in machine learning engineering, MLOps, ML infrastructure, data engineering, or backend/platform engineering in ML environments
Experience supporting ML systems end to end, from model handoff through deployment and monitoring
Strong experience building and owning data pipelines, training pipelines, or other production workflows that support ML
Experience working closely with researchers, data scientists, or ML practitioners to productionize models
Strong software engineering fundamentals and experience building production systems
Experience with monitoring, debugging, and improving production ML or data systems
A track record of improving reliability, scalability, speed, and/or cost efficiency in ML systems
Comfort operating in a fast-moving, startup-style environment with a high degree of ownership

Responsibilities

Build and maintain the workflows and infrastructure that support the end-to-end ML lifecycle
Partner with researchers and ML practitioners to productionize models and enable faster iteration
Design, build, and improve data pipelines and training pipelines
Improve data processing, annotation workflows, and ML system efficiency
Deploy and maintain the background systems that support model training and inference
Build tooling and processes for monitoring model performance, system reliability, and operational health
Improve the scalability, observability, and reproducibility of ML systems
Optimize ML infrastructure for speed, reliability, and cost-efficiency
Identify bottlenecks in the ML workflow and automate or streamline manual processes
Help establish best practices around ML operations, deployment, and system performance

Benefits

Competitive salary and equity in a high-growth startup
Flexible PTO, take what you need
Medical, dental, and vision coverage
Great startup culture, including company off-sites
High-achieving team, including ex-Amazon engineers and alumni of Bain, BCG, Goldman Sachs, and more

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume