Machine Learning Engineer, Platform Integrations

TwelveLabs•San Francisco, CA

About The Position

TwelveLabs builds frontier multimodal foundation models for video understanding. Our models are deployed across a growing set of Cloud Service Provider (CSP) and data platforms — each with different compute hardware, ML inference stacks, and runtime constraints. You'll own the model-level engineering that makes this possible. This means optimizing TwelveLabs models for scalable, reliable, and performant inference across heterogeneous environments — designing how video decode pipelines, tensor orchestration, and model components behave on different hardware and inference engines. Every new platform is a new systems design problem at the model layer. You'll also design and implement massively distributed model inference systems for multimodal inputs, working across varied ML inference stacks — from hardware accelerators (NVIDIA, Trainium, Inferentia) to inference engines (vLLM, FriendliAI) and orchestrators (Ray, Anyscale). Your work directly determines how fast, how reliably, and at what cost TwelveLabs models serve inference at scale.

Requirements

8+ years building ML systems in production, with deep experience in model serving, inference optimization, capacity planning, and GPU compute
Deep understanding of the full model inference stack — from model weights and tensor operations through serving runtimes to accelerator hardware
Designed production services using Python, Postgres, FastAPI, SQLAlchemy, Pydantic (and friends)
Strong hands-on experience with cloud infrastructure (AWS, GCP or Azure), Docker, Kubernetes, and distributed systems in real-world environments — specifically in the context of ML inference and model hosting capabilities
Defined technical roadmap and prioritization for large, ambiguous, cross-functional projects, driving high-impact technical decisions

Nice To Haves

Direct experience working with cloud provider partner teams to scale infrastructure or products across multiple platforms — navigating differences in networking, security, billing, and managed service offerings
Background building platform-agnostic tooling or abstraction layers that work across cloud providers
Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments
Familiarity with ML inference optimization, batching, caching, and serving strategies
Experience with ML infrastructure including GPUs, TPUs, Trainium, or other AI accelerators
Background designing CI/CD systems that automate deployment and validation across cloud environments
Proficiency in Python or Go

Responsibilities

Optimize TwelveLabs' video foundation models for deployment on model inference platforms across public clouds (AWS, Azure, GCP, OCI) and data platforms (Databricks, Snowflake)
Conduct experiments to benchmark and optimize model performance across inference stacks — measuring latency, throughput, and cost across different accelerator and serving configurations
Collaborate with platform partner engineering teams as a peer to resolve inference-level technical challenges and inform how their infrastructure evolves to support multimodal workloads
Work closely with TwelveLabs' core ML research team to ensure model architecture decisions account for multi-platform deployment requirements

Benefits

An open and inclusive culture and work environment.
Work closely with a collaborative, mission-driven team on cutting-edge AI technology.
Full health, dental, and vision benefits
Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years.
VISA support where applicable

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume