Senior MLOps Engineer

Teleo•Palo Alto, CA

5d•$200,000 - $250,000

About The Position

Own the reliability, scalability, and velocity of model training and deployment for autonomy systems. Turn experimental models into dependable production services.

Requirements

2+ years in MLOps / Infra / ML Platform
Deep experience with PyTorch, CUDA-aware workflows
Strong Linux + systems fundamentals
Proven experience deploying models at scale (not just notebooks)

Nice To Haves

Training orchestration: Ray, Slurm, Kubernetes, Airflow
Model lifecycle: Weights & Biases, MLflow, custom registries
Containers: Docker, multi-arch builds
Inference optimization: TensorRT, ONNX, Triton
Monitoring: metrics, logs, alerts for ML systems
Experience with autonomy or robotics
Edge deployment constraints (latency, power, thermal)
Data versioning tools (DVC, LakeFS)

Responsibilities

Design and operate end-to-end ML infrastructure: training, evaluation, deployment, monitoring
Build CI/CD for ML (model versioning, promotion, rollback, canarying)
Own model observability: drift detection, performance regression, data health
Optimize GPU utilization across training and inference (on-prem + cloud)
Support edge deployment (Jetson / Orin / x86 + GPU)
Work closely with perception and autonomy teams to reduce friction from research to production

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume