Senior MLOps Engineer

CompScience•San Francisco, CA

63d•$175,000 - $225,000•Remote

About The Position

We are looking for an experienced and self-motivated Sr MLOps Engineer to join our growing team and take ownership of the infrastructure that powers our core machine learning products. As a key member of our engineering organization at a fast-growing Series B startup, you will be responsible for designing, building, and maintaining the systems that automate the entire lifecycle of our ML models—from data pipelines and training to deployment and production monitoring. This is a high-impact role where you will collaborate closely with our data science and engineering teams to ensure our cutting-edge risk assessment and underwriting models are scalable, reliable, and continuously improving.

Requirements

Bachelor's degree in Computer Science, Engineering, or a related technical field.
5+ years of professional experience in MLOps, DevOps, or a senior Data Engineering role with a focus on operationalizing machine learning models.
Expert-level proficiency in Python for pipeline automation and scripting, including extensive experience with the AWS SDK (Boto3) and Bash.
Deep, hands-on experience with core AWS services, including S3, Lambda, SageMaker, IAM, and a solid understanding of networking within VPCs.
Proven experience building and deploying containerized applications (Docker), especially for serving ML models and LLM-based APIs.
Deep familiarity with Git workflows (branching, merging, rebasing) and experience implementing CI/CD pipelines using tools like GitHub Actions or AWS CodePipeline.
Demonstrated experience in designing and orchestrating complex, data-engineering-heavy pipelines, from data ingestion through to production inference.

Nice To Haves

An active AWS Certification, such as AWS Certified Machine Learning – Specialty or AWS Certified DevOps Engineer – Professional.
Proven experience designing and implementing comprehensive monitoring strategies and observability dashboards (CloudWatch, Grafana) to track model drift, latency, and throughput.
Familiarity with managing hybrid or edge inference deployments (Greengrass, Jetson) and supporting model fine-tuning workflows.

Responsibilities

Design, build, and own the end-to-end MLOps infrastructure on AWS, with a heavy emphasis on scalable data engineering and reliable, cost-efficient ML systems.
Implement and manage high-throughput, event-driven ML workflows (S3, Lambda, SQS, Step Functions, Batch) to support both data-centric pipelines and model execution.
Develop and maintain robust CI/CD pipelines for model deployment and promotion, enforcing best practices for Git, semantic versioning, and multi-branch release strategies.
Orchestrate complex data pipelines for the ingestion, processing, and updating of embeddings in vector databases (e.g., Qdrant, ChromaDB).
Establish and manage systems for training phase management and experiment tracking (e.g., MLflow, SageMaker Experiments) and evaluate modern model serving tools (e.g., BentoML).
Implement comprehensive security measures, including least-privilege access control (IAM) and secure credential management for models and APIs.
Collaborate with data science teams to translate prototypes (including LLMs and standalone APIs) into production-grade services with clear monitoring strategies for production model health.