ML Runtime Optimization Engineer

Applied Intuition•Sunnyvale, CA

4h•Onsite

About The Position

Applied Intuition is seeking a software engineer with deep experience in optimizing ML models and deploying them on production-grade embedded runtime environments. The role involves working across the entire ML framework stack, including PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, and Triton. The engineer will drive ML performance optimization on various technologies for ADAS/AD stacks, targeting deployment on embedded compute platforms. Responsibilities include developing compute usage strategies for efficient model inference, working on model pruning and quantization for memory-constrained platforms, collaborating with ML engineers and software developers, and establishing methodologies for profiling model performance on embedded compute platforms to identify bottlenecks.

Requirements

Bachelors in Electrical Engineering or Computer Science, OR B.Sc. in Computer Science, Mathematics, Physics or a related field
3+ years of experience with ML accelerators, GPU, CPU, SoC architecture and micro-architecture
Strong software development skills with the focus on embedded programming
Experience profiling and optimizing model performance on embedded compute platforms
Experience in working with deep learning frameworks (e.g., PyTorch, JAX, ONNX, etc.)

Nice To Haves

M.Sc or PhD in a ML related area
Built an ML optimization framework from scratch before
Deployed ML solutions to embedded chips for real time robotics applications

Responsibilities

Drive ML performance optimization on multiple technologies for on-road and off-road ADAS / AD stacks targeting deployment on a variety of embedded compute platforms
Develop compute usage strategies to optimize efficiency and latency of model inference for compute boards selected by our customers
Work on model pruning and quantization, and support deployment on memory constrained platforms
Collaborate closely with ML engineers and software developers on technical efforts to find and optimize efficient model architecture solutions
Set up methodologies to profile the model performance on target embedded compute platforms and identify performance bottlenecks as part of stack integration