Human Data Architect, Quality

Mecka•New York, NY

About The Position

Mecka AI is building the data and deployment infrastructure for embodied intelligence. We collect, curate, and license the world's most useful robotics training data to leading AI labs, and we deploy real robotic systems with enterprise customers across hospitality, retail, QSR, pharmacy, logistics, and healthcare. We work with the foundation model teams shaping the next decade of robotics, and with the operators running real businesses today. Quality, trust, and execution are core to our partnerships. The Role We're hiring a Human Data Architect, Quality to be the person with taste for what robotics training data should look like at Mecka. You will define what good data is — the labeling rubrics, ontologies, schemas, sampling philosophy, and acceptance criteria that every dataset we ship is measured against. You decide what goes in or out of a dataset and why. This is a standards-and-methodology architecture role, not a QA-management role. You set the quality bar; data operations and QA teams enforce it. Your output is the spec the entire data org and our customers run on. You will work shoulder-to-shoulder with foundation-model researchers at our customers to translate model behavior into data structure — what to label, how to label it, how to organize it, how to compose a training set, what the edge cases are, and what makes a dataset trainable versus merely large.

Requirements

5+ years working at the intersection of ML and data — annotation methodology, dataset curation, data-centric ML, ground truth design, or labeling-specifications work for autonomy, vision, or multimodal teams.
Hands-on experience designing taxonomies, ontologies, or labeling schemas that fed production model training (not just internal analytics).
Strong data instincts: you can open a dataset in SQL, a notebook, or Python and tell us what's wrong with it within an hour.
Comfortable reading ML papers and translating model-architecture needs into data-structure choices.
Built a labeling rubric, ontology, or ground-truth spec that a large annotation org executed against in production.
Worked directly with research scientists at frontier AI labs or autonomy companies on what training data should contain.
Background in computer vision, robotics, cognitive science, linguistics, or a related field where taxonomy design is craft.
Have strong opinions about data quality you can defend with concrete examples.

Responsibilities

Define the labeling rubrics, severity levels, rejection taxonomies, and acceptance criteria for each customer program across video, sensor streams, trajectories, action labels, task outcomes, language grounding, and metadata.
Translate ambiguous customer requirements ("we want a model that can do X") into precise, measurable, executable data specifications.
Maintain customer-specific quality criteria and the canonical data dictionary every program references.
Build golden datasets, reference examples, and calibration tasks that define "correct" by demonstration, not just description.
Own the taxonomy, schema, and class hierarchies for robotics datasets — how attributes are structured, how temporal segmentation works, how event boundaries are defined, how ambiguity is handled, how edge cases are categorized.
Decide how data is organized end-to-end so it is trainable, queryable, and composable across customers and modalities.
Set dataset versioning conventions, schema evolution rules, and the data-organization philosophy the org runs on.
Own the philosophy for what goes into a dataset and what gets cut: distribution, diversity, edge-case representation, redundancy, license/provenance constraints.
Decide sampling strategies, balancing rules, and curation principles for each program.
Make taste-driven calls on what data is worth collecting at all — and push back when collection plans won't produce trainable data.
Define the acceptance bar that says "this dataset is ready to ship" — and hold it under deadline pressure.
Iterate rubrics and ontology based on model-failure signal from customers — your standards evolve with what models actually struggle to learn.
Run cross-customer reviews of recurring quality misses and translate them into standards improvements.
Partner with engineering on automated validation (schema completeness, duplicates, time sync, metadata coverage, model-assisted review) so the standard is enforceable at scale.