Staff Machine Learning Engineer, Training Runtime Performance

NuroMountain View, CA
78d$235,030 - $352,290

About The Position

Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver™, to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms a clear path to AVs at commercial scale—empowering a safer, richer, and more connected future.

Requirements

  • B.S./M.S./Ph.D. in Computer Science, Electrical Engineering, or related technical field (or equivalent experience).
  • 4+ years of professional experience in ML infrastructure, distributed training, or ML systems engineering, scaling models on multi-node, multi-accelerator clusters.
  • Understanding of training, evaluation, and distillation workflows for billion-parameter models.
  • Expert-level knowledge in distributed systems and (remote) Python.
  • Strong skills in profiling, debugging, and optimizing quantized workloads.
  • Experience with ML compilers and strategies to reduce startup overhead.
  • Familiarity with model distillation and efficient inference workflows.

Nice To Haves

  • Previous contributions to open source ML infra projects or research publications in ML systems.
  • Hands-on experience with Foundation Model infrastructure.
  • Highly proficient in C++, distributed systems, ML framework internals (e.g., NCCL, Horovod, DeepSpeed, Ray).

Responsibilities

  • Collaborate with ML practitioners and other infrastructure teams to understand their needs and integrate optimized input pipelines seamlessly into their workflows.
  • Detect, diagnose, and resolve performance bottlenecks across training, eval, and model distillation workflows.
  • Optimize training performance, resource utilization, and ensure consistent, reproducible model training outcomes.
  • Optimize input data pipelines to increase runtime goodput, ensuring accelerators maximize their 'time on task' and minimize idle cycles.
  • Champion best practices for robust, reproducible, and debuggable ML experimentation.

Benefits

  • Annual performance bonus
  • Equity
  • Competitive benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service