Data Engineer

Humble Robotics•San Francisco, CA

About The Position

We're looking for a data engineer to help us turn raw driving data into the well-structured, queryable, and trustworthy datasets that power our autonomy stack. You'll work across the full lifecycle of our data, from ingestion to the point it gets pulled into a training run, and your work will directly shape how quickly the rest of the team can iterate.

Requirements

BS, MS, or PhD in Computer Science, Engineering, Robotics, or a related field — or equivalent industry experience
Strong proficiency in Python, including writing maintainable code that other engineers will read and extend
Solid database fundamentals and an intuition for designing schemas that hold up as requirements evolve
Working understanding of how ML training pipelines consume data, and an eye for designing upstream systems that serve them well
Comfortable working in large codebases and modern build/dev environments (Bazel, monorepos, dev containers, or similar)
Curious, flexible, and pragmatic — able to pick up unfamiliar tools and reason from first principles rather than relying on prior recipes
Eligible to work in the United States

Nice To Haves

Experience working with data in an autonomous vehicle, robotics, or similar context
Familiarity with Foxglove, rerun, or similar visualization/data-platform tooling
Experience designing or maintaining data catalogs, metadata stores, or feature stores
Background in handling high-volume multi-modal data (video, point clouds, time-series) at terabyte-plus scale
Cloud data engineering experience (GCP or AWS — object storage, serverless triggers, batch processing)
Comfort operating as an early team member — high ownership, low ego, fast iteration

Responsibilities

Build and maintain pipelines that ingest, validate, and process multi-modal sensor logs from our vehicles
Design schemas and data models that make our driving data discoverable and queryable for ML training, evaluation, and debugging
Turn raw driving data into the derived signals, annotations, and aggregates that downstream teams consume
Write tooling that the broader team relies on day-to-day: data loaders, query interfaces, dataset assembly utilities
Collaborate closely with ML, vehicle software, curation, and fleet operations to make sure data flows smoothly from collection through to model training
Contribute to the design of our data stack, making decisions that scale with the team and the fleet