Staff MLOps Engineer – LLMOps

TRM Labs

34d•$220,000 - $240,000

About The Position

The AI Engineering Team is chartered with enabling next-generation AI applications, with a special focus on Large Language Models (LLMs) and agentic systems. Our mission is to build robust pipelines, high-performance infrastructure, and operational tooling that allow AI systems to be deployed with speed, safety, and scale. We manage petabyte-scale pipelines, serve models with millisecond-level latency, and provide the observability and governance needed to make AI production-ready. We’re also deeply involved in evaluating and integrating cutting-edge tools in the LLM and agent space — including open-source stacks, vector databases, evaluation frameworks, and orchestration tools that unlock TRM’s ability to innovate faster than the market. As a Staff MLOps Engineer with a focus in LLMOps, you’ll be at the core of building and scaling the technical infrastructure for AI/ML systems. You will:

Requirements

Write high-quality, maintainable software — primarily in Python, but we value engineering ability over language familiarity.
Have a strong background in scalable infrastructure, including:
Containerization and orchestration (e.g. Docker, Kubernetes)
Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines)
Monitoring and logging frameworks (e.g. Datadog, Prometheus, OpenTelemetry)
Understand and implement ML Ops best practices, including:
Model versioning and rollback strategies
Automated evaluation and drift detection
Scalable model and agent serving infrastructure (e.g. vLLM, Triton, BentoML)
Deploy and maintain LLM and agentic workflows in production, including:
Monitoring cost, latency, and performance
Capturing traces for analysis and debugging
Optimizing prompt/response flows with real-time data access
Demonstrate strong ownership and pragmatism, balancing infrastructure elegance with iterative delivery and measurable impact.

Responsibilities

Build reusable CI/CD workflows for model training, evaluation, and deployment — integrating Langfuse, GitHub Actions, and experiment tracking, etc.
Automate model versioning, approval workflows, and compliance checks across environments.
Build out a modular and scalable AI infrastructure stack — including vector databases, feature stores, model registries, and observability tooling.
Partner with engineering and data science to embed AI models and agents into real-time applications and workflows.
Continuously evaluate and integrate state-of-the-art AI tools (e.g. LangChain, LlamaIndex, vLLM, MLflow, BentoML, etc.).
Drive AI reliability and governance, enabling experimentation while ensuring compliance, security, and uptime.
Build and enhance AI/ML Model Performance
Ensure data accuracy, consistency and reliability, leading to better model training and inferencing
Deploy infrastructure to support offline and online evaluation of LLMs and agents — including regression testing, cost monitoring, and human-in-the-loop workflows.
Enable researchers to iterate quickly by providing sandboxes, dashboards, and reproducible environments.