Machine Learning Engineer, Platform Integrations

TwelveLabsSan Francisco, CA

About The Position

TwelveLabs builds frontier multimodal foundation models for video understanding. Our models are deployed across a growing set of Cloud Service Provider (CSP) and data platforms — each with different compute hardware, ML inference stacks, and runtime constraints. You'll own the model-level engineering that makes this possible. This means optimizing TwelveLabs models for scalable, reliable, and performant inference across heterogeneous environments — designing how video decode pipelines, tensor orchestration, and model components behave on different hardware and inference engines. Every new platform is a new systems design problem at the model layer. You'll also design and implement massively distributed model inference systems for multimodal inputs, working across varied ML inference stacks — from hardware accelerators (NVIDIA, Trainium, Inferentia) to inference engines (vLLM, FriendliAI) and orchestrators (Ray, Anyscale). Your work directly determines how fast, how reliably, and at what cost TwelveLabs models serve inference at scale.

Requirements

  • 8+ years building ML systems in production, with deep experience in model serving, inference optimization, capacity planning, and GPU compute
  • Deep understanding of the full model inference stack — from model weights and tensor operations through serving runtimes to accelerator hardware
  • Designed production services using Python, Postgres, FastAPI, SQLAlchemy, Pydantic (and friends)
  • Strong hands-on experience with cloud infrastructure (AWS, GCP or Azure), Docker, Kubernetes, and distributed systems in real-world environments — specifically in the context of ML inference and model hosting capabilities
  • Defined technical roadmap and prioritization for large, ambiguous, cross-functional projects, driving high-impact technical decisions

Nice To Haves

  • Direct experience working with cloud provider partner teams to scale infrastructure or products across multiple platforms — navigating differences in networking, security, billing, and managed service offerings
  • Background building platform-agnostic tooling or abstraction layers that work across cloud providers
  • Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments
  • Familiarity with ML inference optimization, batching, caching, and serving strategies
  • Experience with ML infrastructure including GPUs, TPUs, Trainium, or other AI accelerators
  • Background designing CI/CD systems that automate deployment and validation across cloud environments
  • Proficiency in Python or Go

Responsibilities

  • Optimize TwelveLabs' video foundation models for deployment on model inference platforms across public clouds (AWS, Azure, GCP, OCI) and data platforms (Databricks, Snowflake)
  • Conduct experiments to benchmark and optimize model performance across inference stacks — measuring latency, throughput, and cost across different accelerator and serving configurations
  • Collaborate with platform partner engineering teams as a peer to resolve inference-level technical challenges and inform how their infrastructure evolves to support multimodal workloads
  • Work closely with TwelveLabs' core ML research team to ensure model architecture decisions account for multi-platform deployment requirements

Benefits

  • An open and inclusive culture and work environment.
  • Work closely with a collaborative, mission-driven team on cutting-edge AI technology.
  • Full health, dental, and vision benefits
  • Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years.
  • VISA support where applicable
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service