Research Engineer, Multimodal Data

EventualSan Francisco, CA
Hybrid

About The Position

Eventual is building a video-native index on top of their open-source engine, Daft, purpose-built for multimodal AI. This aims to accelerate the iteration loop for Physical AI teams by allowing them to describe the dataset they want and receive a curated table in minutes, which can then be fed to GPUs at line rate. The company has raised $30M and has a world-class team from companies like AWS, Render, Pinecone, and Tesla. They are looking for individuals passionate about powering the next generation of Physical AI.

Requirements

  • Strong familiarity with modern vision and multimodal models — convolution nets, VLMs, VQA, embeddings — and a sense for the SOTA that's actually deployable today vs. on a leaderboard.
  • Experience running these models at scale on real video and sensor data, ideally for perception tasks (detection, tracking, segmentation, retrieval, captioning).
  • Background from a perception team at a self-driving, robotics, or visual-data company — or equivalent depth from a research lab.
  • Comfortable with cloud infrastructure and large-scale data processing — you don't need to be a distributed-systems engineer, but you've shipped jobs that ran on thousands of GPU-hours of video.
  • Bias toward data and infrastructure: you reach for "annotate the whole corpus" before "fine-tune another model."

Nice To Haves

  • Experience training vision or multimodal models from scratch (not just calling APIs).
  • ML/AI research background — papers, citations, or a research org on your resume.
  • Hands-on time with big-data frameworks like Spark, Ray, or Daft.
  • Worked on embeddings, retrieval, or content-aware search at scale.
  • Experience designing labeling taxonomies or running annotation programs.

Responsibilities

  • Own the visual understanding roadmap end-to-end: from picking the model family for a customer's taxonomy to landing it in production inference at corpus scale.
  • Train, fine-tune, and evaluate VLMs, VQA models, embedding models, and convolutional perception models against customer datasets and benchmarks.
  • Drive down per-clip annotation cost — model selection, distillation, batching, decode pipelining — so "annotate every clip in a 10K-hour corpus" stays economical.
  • Build the rich, queryable datasets that customers train on: design taxonomies with researchers, instrument quality, version the outputs.
  • Partner with the dataloading and storage teams so visual understanding outputs flow into the index and on to the GPU without re-engineering.
  • Work directly with researchers at our partner labs — your shortest feedback loop is their next training iteration.

Benefits

  • Competitive comp and meaningful startup equity.
  • Catered lunches and dinners for SF employees.
  • Commuter benefit.
  • Team-building events and poker nights.
  • Health, vision, and dental coverage.
  • Flexible PTO.
  • Latest Apple equipment.
  • 401(k) plan with match.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service