Engineer Manager - ML Data and Evaluation, Self-Driving Systems

Applied Intuition•Sunnyvale, CA

5d•$255,700 - $346,000•Onsite

About The Position

Applied Intuition is seeking an Engineering Manager to lead the data and evaluation layer that powers their end-to-end autonomy models. This role encompasses data enrichment and autolabeling, dataset curation and corpus management, evaluation and metrics infrastructure, and closed-loop systems connecting on-road performance to training. The team is responsible for the pace of model iteration and will be hands-on in technical decisions. The company is profitable, growing rapidly, and serves industries such as automotive, defense, trucking, construction, mining, and agriculture with its autonomy software.

Requirements

5+ years building ML or data systems for robotics or production software systems.
2+ years managing or technically leading engineering teams.
Experience with large-scale data pipelines: ingestion, curation, and processing of large-scale multi-modal sensor data.
Experience reasoning about dataset composition, distribution balance, and corpus-level quality - making data decisions that measurably improved model performance.
Strong software engineering in Python; comfort with C++ and distributed systems.

Nice To Haves

Shipped perception, prediction, or planning models to production vehicles.
Experience with state-of-the-art simulation for ML eval (e.g. neural rendering and simulation).
Labeling and auto-labeling pipelines: automated pre-labeling, quality verification, human-in-the-loop workflows.
RL and reward engineering for autonomous driving or robotics.

Responsibilities

Own data enrichment: the ML pipelines that produce semantic labels, object annotations, behavior tags, and derived features at petabyte scale across cameras, lidar, and radar. Ensure enrichment quality keeps pace with model requirements as they evolve.
Build the curation and corpus management systems: distribution analysis, targeted mining for long-tail scenarios, embedding-based data selection, scenario diversity and geographic balance enforcement.
Own evaluation from offboard metrics to on-road driving quality. Define the metrics, benchmarks, and regression tests that determine whether a model ships. Close the sim-to-real gap. Build "eval of eval" tooling to measure and improve the evaluation system itself.
Recruit, develop, and technically lead the team. Build a culture of rigor on a safety-critical system.