Staff Machine Learning Engineer, Training Runtime Performance

NuroMountain View, CA
78d$235,030 - $352,290

About The Position

Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver, to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms a clear path to AVs at commercial scale-empowering a safer, richer, and more connected future.

Requirements

  • B.S./M.S./Ph.D. in Computer Science, Electrical Engineering, or related technical field (or equivalent experience).
  • 4+ years of professional experience in ML infrastructure, distributed training, or ML systems engineering, scaling models on multi-node, multi-accelerator clusters.
  • Understanding of training, evaluation, and distillation workflows for billion-parameter models.
  • Expert-level knowledge in distributed systems and (remote) Python.
  • Strong skills in profiling, debugging, and optimizing quantized workloads.
  • Experience with ML compilers and strategies to reduce startup overhead.
  • Familiarity with model distillation and efficient inference workflows.

Nice To Haves

  • Previous contributions to open source ML infra projects or research publications in ML systems.
  • Hands-on experience with Foundation Model infrastructure.
  • Highly proficient in C++, distributed systems, ML framework internals (e.g., NCCL, Horovod, DeepSpeed, Ray).

Responsibilities

  • Collaborate with ML practitioners and other infrastructure teams to understand their needs and integrate optimized input pipelines seamlessly into their workflows.
  • Detect, diagnose, and resolve performance bottlenecks across training, eval, and model distillation workflows.
  • Optimize training performance, resource utilization, and ensure consistent, reproducible model training outcomes.
  • Optimize input data pipelines to increase runtime goodput, ensuring accelerators maximize their 'time on task' and minimize idle cycles.
  • Champion best practices for robust, reproducible, and debuggable ML experimentation.

Benefits

  • Annual performance bonus
  • Equity
  • Competitive benefits package

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Senior

Industry

Publishing Industries

Education Level

Master's degree

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service