ML Ops Engineer (AI)

SewerAI Corporation

1d•$130,000 - $160,000

About The Position

SewerAI is transforming underground infrastructure management through AI-powered inspection and risk analysis. Our platform helps contractors, engineering firms, and utilities unlock valuable insights from sewer inspection data—turning hours of manual video review into actionable intelligence in minutes. After doubling our customer base over the past year, we’re now entering an exciting phase of accelerated growth. We're looking for an MLOps Engineer to own the Machine Learning Operations infrastructure that powers our AI products. In this role, you will be the architectural backbone of our machine learning systems, responsible for designing, hardening, and scaling the infrastructure that powers our applied machine learning models for underground infrastructure and sewer line analysis. You will focus on transitioning research and development into robust, production-ready systems. This means taking ownership of our training and inference pipelines, fortifying our cloud-based architecture, and building seamless CI/CD processes to ensure our models deliver reliable, high-performing, and secure actionable insights for defect detection and infrastructure maintenance.

Requirements

Cloud Infrastructure: Deep expertise in AWS (e.g., EC2, S3, EKS, SageMaker, Lambda) and cloud security best practices.
Containerization & Orchestration: Strong experience with Docker and Kubernetes for packaging and scaling ML applications.
Infrastructure as Code (IaC): Proficiency with tools like Terraform or AWS CloudFormation.
CI/CD Pipelines: Experience building robust automated pipelines using GitHub Actions, GitLab CI, or Jenkins.
Programming: Strong Python skills with a focus on writing clean, production-grade, and well-tested code.
MLOps Frameworks: Familiarity with model registry and tracking tools (e.g., MLflow, Weights & Biases).
4-6+ years of experience in MLOps, DevOps, or Data Engineering, with a strong emphasis on machine learning workloads.
A security-first and stability-first mindset—you think about edge cases, failure modes, and system hardening by default.
Strong collaborative instincts to work closely with Data Scientists, ensuring smooth handoffs from experimentation to production.
Clear communication skills to articulate architectural decisions and tradeoffs to the broader technical team.

Nice To Haves

Experience with our specific data stack (Hex, dbt, ClickHouse, Anyscale, Ray, Deeplake).
Familiarity with deep learning frameworks (PyTorch preferred) and optimization techniques like TensorRT or ONNX.
Knowledge of edge computing or deploying models to IoT devices.
Experience in the infrastructure, utility, or geospatial domains.

Responsibilities

Architectural Hardening: Audit, secure, and optimize our existing cloud infrastructure (AWS) to ensure high availability, fault tolerance, and security for both training and production workloads.
Model Deployment & Inference: Design and maintain scalable architectures for serving deep learning models (PyTorch/TensorFlow), optimizing for low latency and high throughput in handling complex infrastructure data.
CI/CD for Machine Learning: Build and maintain automated pipelines for model testing, validation, deployment, and rollback.
Training Infrastructure: Architect efficient, scalable compute environments for training complex computer vision and time-series models on large datasets.
Monitoring & Observability: Implement comprehensive monitoring for model drift, data quality, and system health, ensuring rapid response to performance degradation.