Senior Data Engineer, AI&Robotics

General MotorsWarren, MI
4dOnsite

About The Position

The Senior Data Engineer, AI and Robotics will join the AI Research team within the Autonomous Robotics Center (ARC). This role owns the robotics data backbone that enables scalable robot learning in manufacturing — from data capture and curation through versioning, serving, and auditing. Your work will make model development reproducible, testable, and production-ready. This is a robotics and machine learning (ML) infrastructure role, focused on multimodal robotic datasets and continuous model iteration. You will partner closely with AI researchers, robotics engineers, and plant teams to turn real-world robot behavior and failures into high-quality training data and robust production systems.

Requirements

  • B.S. or M.S. in Computer Science, Computer Engineering, Data Engineering, or a related field.
  • 6+ years of experience building production data systems and/or ML infrastructure, with practical experience supporting training pipelines end-to-end.
  • Strong proficiency in Python and at least one of: C++, Scala, or Java.
  • Demonstrated engineering discipline in testing, documentation, and operational reliability.
  • Experience with dataset versioning, lineage, and reproducibility tooling (e.g., DVC or equivalent approaches).
  • Experience with experiment tracking and model registry patterns (e.g., MLflow or equivalent tools).
  • Ability to work onsite with hardware and robotics teams, and to design pipelines that handle real-world robotic logging constraints (e.g., bandwidth limits, dropped frames, timing drift).

Nice To Haves

  • Hands-on robotics logging and replay experience (e.g., ROS 2 bags, system telemetry pipelines).
  • Experience with simulation-to-real data workflows and dataset synthesis strategies.
  • Familiarity with data governance requirements and auditability in safety-adjacent or safety-critical systems.
  • Experience building tools to support data labeling workflows, quality assurance, and active learning loops.

Responsibilities

  • Build and operate multimodal data pipelines for robotics (e.g., vision, depth, force/torque, joint states, events, metadata), including reliable capture from lab and plant-adjacent cells.
  • Implement reproducible data logging and replay workflows (including ROS 2 bagging where applicable) to enable debugging, regression testing, and dataset construction.
  • Own dataset lifecycle management, including versioning and lineage, provenance, governance, and data quality gates to support trustworthy model training and evaluation.
  • Integrate experiment tracking and model/data traceability so teams can compare runs, reproduce results, and audit changes over time.
  • Establish MLOps automation patterns (CI/CD/CT-style pipelines for ML) that reduce manual toil and increase deployment confidence for robotics AI updates.
  • Partner with AI/ML, planning, and validation teams to define data contracts (schemas, labeling standards, failure taxonomies) and turn field failures into curated training datasets.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service