Staff ML/LLM Ops Engineer

LVT•Seattle, WA

4d•$213,300 - $272,000

About The Position

We are seeking a Staff ML/LLM Ops Engineer to own the model lifecycle as infrastructure that turns the path from research to production into standardized self-serve tooling. The model portfolio this platform serves spans both the computer-vision models in production today and a growing set of LLM, VLM, and agentic workloads. Bringing those generative workloads under the same lifecycle discipline: serving, version-pinning, evaluation, guardrails, and cost and latency monitoring is a part of this role's scope. This is a senior individual-contributor and technical-leadership role. You will partner closely with AI/ML research, the application backend team, and platform and infrastructure teams. You should be equally comfortable discussing model-serving architectures, CI/CD and rollback design, polyglot service contracts, and production observability.

Requirements

8+ years of engineering experience with deep ML-infrastructure / MLOps work, including building and operating a model deployment, serving, and monitoring platform in production.
Hands-on experience operating LLM or VLM workloads in production including model serving or managed-provider integration, prompt and version management, generative evaluation, guardrails, and token cost and latency control.
Experience designing self-serve ML deployment for other teams, including model registry and packaging, CI/CD for models, serving contracts, rollback, and drift/quality monitoring.
Strong systems and API design judgment across a polyglot boundary with the operational maturity to own security, observability, and on-call trade-offs.
A track record of setting technical direction and leveling up engineers (technical leadership; formal management not required).
Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Nice To Haves

Computer Vision / video model inference at scale (GPU serving, latency and cost optimization).
Cloud-native infrastructure (Kubernetes, Argo, or a comparable deployment stack).
Experience standing up an ML platform from zero on a team that did not have one.
Experience deploying AI models to edge environments (e.g. NVIDIA Jetson or similar).
Agentic and generative tooling: LangGraph, MCP frameworks, vector databases, and inference/serving platforms.

Responsibilities

Own the model lifecycle end to end: standardized packaging, a model CI/CD path, a serving layer with stable, versioned contracts, automated deployment and rollback, and monitoring and drift detection.
Bring LLM, VLM, and agentic workloads under the same platform discipline as the vision models serving with models and prompts version-pinned as deployable, rollback-able artifacts; generative evaluation and regression suites that don't reduce to precision/recall; production guardrails such as input/output filtering and jailbreak and refusal monitoring; and token-level cost and latency observability. Where retrieval or agent orchestration is in play, own the operational seams (vector stores, request tracing) the same way.
Make the path from research to production self-serve and safe by encoding the security, observability, and on-call guardrails engineers enforce by hand today, so model owners can ship without lowering the operational bar.
Define and own the contract boundary between the model platform and the application backend so engineers integrate against deployed models independently.
Set technical standards and mentor IC productionization work toward the platform, growing the function as the team forms.