Onboard AV Software Engineer

Humble Robotics•San Francisco, CA

About The Position

We're looking for a software engineer to optimize and deploy ML models on our trucks' onboard compute, and to own performance across the full autonomous driving stack. You'll take models from our ML team and make them run fast, efficiently, and reliably on embedded GPUs—using TensorRT, custom CUDA kernels, and low-level systems engineering. Beyond inference, you'll profile and optimize the entire onboard software pipeline to meet hard real-time deadlines. This is a rare chance to bridge ML and embedded systems for production autonomous freight, with the freedom and responsibility that comes with a small team tackling a massive problem.

Requirements

BS, MS, or PhD in Computer Science, Electrical Engineering, Robotics, or a related field—or equivalent industry experience
Strong proficiency in C++ and/or Rust for performance-critical systems
Hands-on experience with GPU-accelerated computing—CUDA, TensorRT, or similar inference optimization toolchains
Familiarity with ML model architectures (transformers, CNNs) and the ability to reason about computational cost and memory footprint
Eligible to work in the United States

Nice To Haves

Experience with onboard software for autonomous vehicles, robotics, or IoT/edge devices
Deep knowledge of CUDA, TensorRT, model quantization, and kernel-level optimization
Experience with Bazel or similar build systems for complex codebases
Familiarity with real-time robotic systems
Experience profiling and optimizing full-system performance (CPU, GPU, memory, I/O) on embedded platforms
Comfort operating as an early team member—high ownership, low ego, fast iteration

Responsibilities

Optimize and deploy neural network models for onboard inference using TensorRT and custom CUDA kernels
Profile and reduce end-to-end latency across the autonomous driving stack—from sensor ingestion to control
Build and maintain the onboard C++ and Rust software infrastructure, including real-time data pipelines, inter-process communication, and hardware abstraction layers
Implement model quantization, pruning, and other optimization techniques to maximize throughput on embedded GPU platforms
Collaborate with ML engineers to ensure models are designed for efficient deployment, and with vehicle systems engineers to meet real-time safety constraints