Robotics Data Pipeline Engineer – Multimodal Data

Persona AI Inc•Houston, TX

1d•Hybrid

About The Position

Persona AI is developing and commercializing rugged, multi-purpose humanoid robots that perform real work. Persona’s founding team has a decades-long history in humanoid robotics, bionics, and product development delivering robust hardware that has touched the stars, worked miles below the surface of the ocean, and even roamed Disney Parks. Our mission is focused squarely on shipping beautiful, reliable products at massive scale, while building a customer-focused team to achieve these aims. At Persona we require an unprecedented volume of high-quality, multimodal data. We are moving beyond basic teleoperation to leverage massive datasets of in-the-wild egocentric video combined with dense sensor streams (IMU, haptics, kinematics, and high-fidelity force profiles). We are seeking a highly skilled Data Pipeline Engineer to architect the systems that turn this raw, unstructured multimodal data—including critical force-aware data collections—into high-fidelity training assets for our robots.

Requirements

B.S., M.S., or Ph.D. in Computer Science, Data Engineering, Machine Learning, Robotics, or a related field.
Deep expertise in Python and extensive experience with PyTorch, specifically in handling custom dataloaders for multimodal datasets.
Experience analyzing and processing complex time-series data from force-torque (F/T) sensors, load cells, or tactile arrays, ensuring pristine alignment with visual frames.
Mastery of video processing pipelines and libraries (OpenCV, FFmpeg, Decord) and managing the I/O bottlenecks of terabyte-scale video datasets.
Hands-on experience with 3D hand tracking, human pose estimation (e.g., MediaPipe), and spatial geometry calculations.
Strong understanding of modern imitation learning paradigms, VLA architectures, and frameworks focused on human-to-robot transfer (e.g., EgoScale, EgoMimic, or OpenVLA).
Proven ability to implement programmatic and generative data augmentation techniques for computer vision and time-series data.

Nice To Haves

Experience with NVIDIA’s robotic software stack (Isaac, Cosmos, or components of the GR00T framework).
Familiarity with distributed data processing systems (Ray, Apache Spark) for cluster computing.
Background in generating or utilizing synthetic robotic data via simulation (Omniverse, MuJoCo).
Experience integrating spatial awareness or tactile data representations (e.g., Fourier encoding) into visual pipelines.

Responsibilities

Architect highly efficient, scalable pipelines to ingest, decode, and synchronously process thousands of hours of high-resolution egocentric video alongside rich sensor streams (IMUs, force-torque sensors, tactile pads, and joint proprioception).
Develop sophisticated post-processing algorithms to analyze force interactions and infer unobservable or missing states from raw data. This includes calibrating and cleaning direct force-aware data collections, estimating contact forces from object deformation, tracking occluded objects during complex manipulation, or applying inverse kinematics to fill in missing joint trajectories.
Develop algorithms to translate 3D human hand tracking, wrist motion, and pose estimation into the specific 6DoF/joint-space coordinates of our humanoid’s end-effectors, relying on sensor fusion to ensure absolute precision.
Implement robust data augmentation strategies (spatial transformations, temporal scaling, synthetic viewpoints, and sensor noise injection) to expand expert trajectories and improve the robustness of our learning models.
Work closely with the Hardware Teleoperation Team (UMI & Console operators) to perfectly align human-robot play-data (haptics, force profiles, video, audio, telemetry) with large-scale pre-training datasets.