Senior Autonomy Data Engineer

Torc Robotics•Blacksburg, VA

About The Position

At Torc, we are a leader in autonomous driving technology, focused on developing software for automated trucks. As a Senior Autonomy Data Engineer, you will design, build, and operate the data infrastructure that powers our autonomy program. This role involves creating pipelines, storage systems, and tooling to transform raw vehicle sensor logs into curated, structured datasets essential for our perception, planning, and simulation engineers. It's a high-ownership position on a lean team, tackling the complex challenge of reliably moving and processing large-scale sensor data from vehicles for model training.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, Electrical Engineering or a related field with 6+ years of data engineering experience or a Master’s with 4+ years.
Strong proficiency in Python and SQL, with demonstrated ability to build production-quality data pipelines.
Deep experience with cloud data infrastructure (AWS preferred: S3, Glue, Athena, Redshift, or equivalent) and infrastructure-as-code tools (Terraform, Cloud Formation).
Solid understanding of data partitioning strategies and columnar storage formats (Parquet, Orc, etc.).
Experience building and operating data pipelines that process time-series and binary data.
Proven ability to evaluate and integrate open-source tooling when appropriate versus building from scratch.
Strong instincts for delivering data quality through first-class implementations of monitoring, validation, and lineage tracking.

Nice To Haves

Experience with autonomous vehicles, robotics, or other sensor-driven autonomous systems.
Deep experience with Foxglove or Rerun beyond basic playback, e.g. building custom extensions or integrating them into a structured log review or annotation QA workflow.
Familiarity with the MCAP CLI and/or python library and experience converting MCAP data to columnar data formats for further querying and processing.
Experience with data curation for ML training, e.g. diversity sampling, pseudo-labeling, and dataset versioning.

Responsibilities

Own the design and organization of the program’s data lake, including schema definitions, partitioning strategy, and metadata indexing.
Design and maintain end-to-end pipelines that ingest high-bandwidth sensor logs from vehicles into cloud storage with high reliability, tolerant of ad-hoc and intermittent connectivity.
Develop data validation and integrity checks to detect corrupted information, missing sensors, and inconsistent calibration before data is processed by downstream systems.
Implement retention, tiering, and lifecycle policies for data to balance storage costs with development value.
Build tooling to query raw logs to produce curated training and evaluation datasets.
Build automation to run cost-effective pseudo-labeling workflows at the scale of data ingest.
Implement data quality and model performance metrics to direct labeling effort toward the highest-value examples.
Deploy and maintain data visualization tooling for log review, annotation QA, and autonomy debugging.
Build integrations between visualization tooling and the data lake for engineers to navigate from dataset entries or model failures to origin log data.
Work with autonomy engineers to define and surface custom visualization panels and implement metrics for analyzing unstructured operating environments.
Build dashboards providing autonomy engineers visibility into data coverage by terrain type, operating environment, and geographic region.
Establish and document data contracts between data services and model training consumers.
Partner with perception, planning, and embedded engineers across the data lifecycle, from shaping logging schemas and collection triggers to defining dataset interfaces for model training and evaluation.
Define data engineering standards, best practices, and tooling choices.
Contribute to the data roadmap and provide input to technical leadership on investment priorities.
Mentor junior engineers and raise the team’s capabilities in data infrastructure scalability and operational hygiene.

Benefits

A competitive compensation package that includes a bonus component and stock options
100% paid medical, dental, and vision premiums for full-time employees
401K plan with a 6% employer match
Flexibility in schedule and generous paid vacation (available immediately after start date)
Company-wide holiday office closures
AD+D and Life Insurance
Sign-on payments
Relocation assistance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume