Senior Annotation and Data Pipeline Manager

Genesis•San Carlos, CA

About The Position

Genesis is a full-stack general-purpose robot company aiming to make general-purpose robots a reality. The company is backed by prominent investors and partners and has a team working across the Bay Area and Europe to develop robots capable of performing real work. Genesis recently unveiled Eno, a wheeled robot powered by their GENE foundation model, and is beginning to deploy it with customers. This role is critical to the data engine that transforms raw human demonstrations into data that measurably improves the foundation model. The position involves building and scaling a data pipeline and annotation operation, managing datasets and ontologies, and leveraging vision-language models for annotation and data synthesis. The goal is to scale the pipeline through automation rather than solely through headcount, ensuring high-quality, training-ready data for both internal use and partner-funded collection efforts.

Requirements

Scaled an annotation or data pipeline at a serious operation.
Four or more years in data or ML pipelines, including time leading the work.
Experience at a frontier AI lab or top data operation, taking raw robot or embodied data to training-ready at volume.
Strong Python (Pandas, NumPy, PyTorch) and SQL skills.
Ability to write automation that shrinks the pipeline.
ML literacy, understanding training vs. test, precision and recall, and overfitting.
Hands-on technical leadership, able to run a labeling operation and remain a hands-on contributor.
Comfortable with ambiguity and speed, moving fast in a research-paced environment and bringing order to it.

Responsibilities

Run the data engine, owning the loop from raw trajectory and video to training-ready datasets with validation steps.
Own datasets and ontology, deciding what gets annotated and how, and designing the ontology with the model team.
Automate with models, using vision-language models for automated trajectory annotation, language grounding, and data synthesis.
Run the annotation operation, scaling labeling (internal and vendor) against quality bars and delivery schedules.
Close the loop by turning real-robot evaluation failures into targeted collection and annotation jobs, and proving data improves the model.
Own the metrics, tracking inter-annotator agreement, label error rate, and throughput per annotator-hour, and driving them effectively.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume