MLOps Engineer

BizFirst•Alexandria, VA

3d•Hybrid

About The Position

BizFirst is assisting our client with the hiring of an MLOps Engineer to build and operate the infrastructure, tooling, and processes that keep machine learning models running reliably in production. This is a foundational role in the client’s growing AI practice, sitting at the intersection of data engineering, platform engineering, and applied ML – where your work directly enables data scientists and ML engineers to move faster and ship with confidence. Our client is a mid -market professional services organization that is actively rethinking how it designs and executes its core business operations through artificial intelligence and automation. The company is building a dedicated AI capability to embed machine learning and generative AI into its most critical internal workflows, from decision support and process automation to real -time analytics and intelligent document processing. The ideal candidate has 4–8 years of experience in MLOps, DevOps, or platform/data engineering, with direct experience standing up and maintaining ML infrastructure in cloud environments. You have worked with CI/CD pipelines, containerized ML workloads, and model registries – and you understand what it takes to move models from a notebook to a production system that is observable, scalable, and maintainable.

Requirements

US Citizen or Permanent Resident authorized to work in the United States.
Experience: 4–8 years in MLOps, platform engineering, or a DevOps role with direct ML workload responsibility.
Infrastructure: Proficiency with Docker, Kubernetes, and cloud platforms (AWS SageMaker, GCP Vertex AI, or Azure ML).
Pipelines: Hands -on experience with orchestration tools such as Airflow, Prefect, Kubeflow Pipelines, or similar.
ML Tooling: Working knowledge of MLflow, Weights & Biases, or equivalent experiment tracking and model registry platforms.
Programming: Strong Python skills; comfort writing infrastructure -as -code (Terraform, Pulumi, or CloudFormation).
Monitoring: Experience building observability into production ML systems – metrics, logging, alerting, and dashboards.

Nice To Haves

Experience supporting generative AI workloads, including LLM inference infrastructure and GPU resource management.
Familiarity with feature stores (Feast, Tecton, or similar) and online/offline feature serving patterns.
Background working in a fast -moving team where data scientists and ML engineers are primary customers.
Experience with cost optimization strategies for large -scale cloud -based ML training and inference.
Degree in Computer Science, Software Engineering, or a related technical field.

Responsibilities

Design, build, and maintain end -to -end ML pipelines including data ingestion, feature engineering, model training, evaluation, and deployment.
Implement and manage CI/CD workflows for ML models, ensuring consistent, automated paths from experimentation to production.
Own the model registry, versioning strategy, and experiment tracking infrastructure used across the AI team.
Build monitoring and alerting systems to detect model drift, data quality issues, and performance degradation in deployed systems.
Manage containerized ML workloads using Docker and Kubernetes, including scheduling, resource allocation, and cost optimization.
Collaborate closely with data scientists and ML engineers to understand infrastructure needs and reduce friction in the development lifecycle.
Evaluate and adopt MLOps tooling (orchestration, feature stores, serving frameworks) to mature the team’s operational practices.
Develop runbooks, documentation, and incident response procedures for production ML systems.