Evaluation Infrastructure plays a critical role at Nuro, directly enabling L4 driverless deployment. The team supports two demanding workloads: day-to-day Autonomy Evaluation that powers rapid software iteration, and large-scale Driverless Safety Validation that produces the rigorous evidence required to deploy autonomy on public roads. The Evaluation Infrastructure team builds the metrics framework, evaluation pipelines, introspection tooling, and analysis products that turn raw on-road and simulation logs into actionable insight. Our metrics stack spans both heuristic and ML-based approaches, covering everything from low-level component accuracy to end-to-end behavior quality. The platform empowers autonomy and Systems & Safety teams to run complex evaluations and validations across a wide range of configurations and scales, producing the high-fidelity metrics that drive both short-term iteration and long-term release confidence — in close partnership with Simulation and the broader AI Platform. As the Technical Lead, you will lead the team with deep technical guidance and rigor, setting the technical bar, shortening the time-to-signal for evaluation and the time-to-confidence for validation, so that both autonomy and Systems & Safety teams can iterate fast while deploying software safely.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior