Software Engineer, Multimodal Storage Infrastructure

Eventual•San Francisco, CA

23h•Onsite

About The Position

Eventual is building a multimodal warehouse on top of its open-source distributed data engine, Daft, which is purpose-built for multimodal AI. Today's data platforms are not designed for the needs of AI data, requiring teams to build their own solutions. Eventual aims to provide this missing layer, enabling the co-indexing of video, sensors, and sim outputs on the same row, aligned on timecode, and versioned, with a content-aware query layer. The company has raised $30M and has a world-class team from companies like AWS, Render, Pinecone, and Tesla. They are looking for individuals passionate about building the future of Physical AI infrastructure.

Requirements

Love thinking about indices. B+ trees, LSM trees, bitmap indices, vector indices, learned indices — you have favorites and you have grudges.
Love thinking about query engines. Predicate pushdown makes you happy. Late materialization makes you happier.
Strong familiarity with the storage hierarchy: cloud object stores, NVMe, block storage, spinning disk, RAM, GPU memory — and the latency and cost of moving between them.
Strong opinions about Parquet — love it or hate it, you've earned the opinion. Same for Iceberg, Delta, Lance, and the other lakehouse formats.
A real love for databases and query systems. You read database papers for fun.
Believe the best read is the read elided.

Nice To Haves

Background from a storage or table-format team — Lance, Iceberg, Delta, Hudi, Spiral, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse, or similar.
Have attempted to build your own database before. Or, at minimum, fantasized about it in detail.
Experience with Rust or modern C++ for storage engines.
Hands-on time with vector indices (HNSW, IVF, SCANN) or hybrid retrieval systems.
Comfort with the OLAP/lakehouse ecosystem: catalogs, file layout, compaction, manifest formats, time travel.

Responsibilities

Design and build the storage and indexing layer: row groups, column chunks, secondary indices, vector indices, and the metadata that lets queries skip everything that doesn't matter.
Push the query engine harder — predicate pushdown, projection pushdown, late materialization — across multimodal columns including video, embeddings, and sensor streams.
Choose, extend, or build on top of modern open formats (Parquet, Iceberg, Delta etc) and build our own/contribute upstream where it makes sense.
Build versioning and schema evolution for multimodal datasets so customer data stays reproducible across months of experimentation.
Partner with the Dataloading team on the format-to-loader boundary so an iceberg.scan(...) translates into the absolute minimum of bytes hitting NVMe.
Partner with the Visual Understanding team to land model outputs in the index without an external glue layer.