Senior/Staff Machine Learning Engineer, Training Runtime Performance

Nuro•Mountain View, CA

124d•$235,030 - $352,290•Onsite

About The Position

We are seeking a highly experienced Staff Software Engineer to join our ML Infrastructure team, focusing on optimizing training runtime efficiency and input pipelines for model training, evaluation, and distillation workloads. In this role you will enable models to train faster and more efficiently - accelerating our self-driving roadmap of commercial and personal mobility.

Requirements

B.S./M.S./Ph.D. in Computer Science, Electrical Engineering, or related technical field (or equivalent experience).
4+ years of professional experience in ML infrastructure, distributed training, or ML systems engineering, scaling models on multi-node, multi-accelerator clusters.
Understanding of training, evaluation, and distillation workflows for billion-parameter models
Expert-level knowledge in distributed systems and (remote) Python
Strong skills in profiling, debugging, and optimizing quantized workloads.
Experience with ML compilers and strategies to reduce startup overhead
Familiarity with model distillation and efficient inference workflows.

Nice To Haves

Previous contributions to open source ML infra projects or research publications in ML systems.
Hands-on experience with Foundation Model infrastructure
Highly proficient in C++, distributed systems, ML framework internals (e.g., NCCL, Horovod, DeepSpeed, Ray)

Responsibilities

Collaborate with ML practitioners and other infrastructure teams to understand their needs and integrate optimized input pipelines seamlessly into their workflows.
Detect, diagnose, and resolve performance bottlenecks across training, eval, and model distillation workflows.
Optimize training performance, resource utilization, and ensure consistent, reproducible model training outcomes.
Optimize input data pipelines to increase runtime goodput, ensuring accelerators maximize their "time on task" and minimize idle cycles.
Champion best practices for robust, reproducible, and debuggable ML experimentation.