ML Runtime Optimization Engineer

Applied IntuitionSunnyvale, CA
Onsite

About The Position

Applied Intuition is seeking a software engineer with deep experience in optimizing ML models and deploying them on production-grade embedded runtime environments. The role involves working across the entire ML framework stack, including PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, and Triton. The engineer will drive ML performance optimization on various technologies for ADAS/AD stacks, targeting deployment on embedded compute platforms. Responsibilities include developing compute usage strategies for efficient model inference, working on model pruning and quantization for memory-constrained platforms, collaborating with ML engineers and software developers, and establishing methodologies for profiling model performance on embedded compute platforms to identify bottlenecks.

Requirements

  • Bachelors in Electrical Engineering or Computer Science, OR B.Sc. in Computer Science, Mathematics, Physics or a related field
  • 3+ years of experience with ML accelerators, GPU, CPU, SoC architecture and micro-architecture
  • Strong software development skills with the focus on embedded programming
  • Experience profiling and optimizing model performance on embedded compute platforms
  • Experience in working with deep learning frameworks (e.g., PyTorch, JAX, ONNX, etc.)

Nice To Haves

  • M.Sc or PhD in a ML related area
  • Built an ML optimization framework from scratch before
  • Deployed ML solutions to embedded chips for real time robotics applications

Responsibilities

  • Drive ML performance optimization on multiple technologies for on-road and off-road ADAS / AD stacks targeting deployment on a variety of embedded compute platforms
  • Develop compute usage strategies to optimize efficiency and latency of model inference for compute boards selected by our customers
  • Work on model pruning and quantization, and support deployment on memory constrained platforms
  • Collaborate closely with ML engineers and software developers on technical efforts to find and optimize efficient model architecture solutions
  • Set up methodologies to profile the model performance on target embedded compute platforms and identify performance bottlenecks as part of stack integration

Benefits

  • Base salary
  • Equity in the form of options and/or restricted stock units
  • Comprehensive health, dental, vision, life and disability insurance coverage
  • 401k retirement benefits with employer match
  • Learning and wellness stipends
  • Paid time off
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service