Software Engineer, Distributed Data Systems

ExaSan Francisco, CA
1dOnsite

About The Position

Exa is building a search engine from scratch to serve every AI application. We build massive-scale infrastructure to crawl the web, train state-of-the-art embedding models to index it, and develop super high performant vector databases in Rust to search over it. We also own a $5M H200 GPU cluster that regularly lights up tens of thousands of machines. As a Data Engineer, you'll architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search. You'll have enormous autonomy in designing systems that scale to hundreds of petabytes. If you've ever wanted to build data pipelines at a scale that most companies only dream about, this is your chance.

Requirements

  • Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them
  • Experience building and operating large-scale distributed data processing pipelines
  • Hands-on experience with streaming data systems (Kafka, Flink, or similar)
  • Familiarity with Ray, Spark, or ClickHouse at production scale
  • An obsessive focus on reliability and building systems that don't page you at 3am

Nice To Haves

  • Experience with Lance or other vector-native storage formats
  • Background in GPU-accelerated data processing (RAPIDS, cuDF)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service