AI Inference Engineer

sauron.systems•San Francisco, CA

53d

About The Position

We’re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't just be managing pipelines; you’ll be squeezing every drop of performance out of silicon to ensure our perception systems can see, think, and act in real-time. You will own productionizing of AI - taking sophisticated models and transforming them into lightning-fast, production-ready engines running on edge devices in homes across the country. If you are obsessed with CUDA kernels, TensorRT optimizations, and the challenge of deploying robust vision systems on real robots, we want to talk to you.

Requirements

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Robotics, or a related field.
3+ years of experience developing and deploying computer vision or machine learning applications on real-world robotic systems (not just in simulation).
High proficiency in C, C++, and Python, with a focus on real-time and embedded systems.
Expert-level knowledge of the NVIDIA Jetson ecosystem (JetPack SDK, DeepStream, TensorRT) and a deep understanding of CUDA/GPU architecture.
Hands-on experience with video streaming tools like ffmpeg and protocols such as RTSP, RTP and HLS.
Proven track record of deploying AI systems that operate in the field, handling the unpredictability of real-world sensor data.

Nice To Haves

Familiarity with NVIDIA’s broader robotics stack
Experience with ML compilers or compiler-level optimizations for GPU inference.
Specific background in sensor fusion and AI-driven obstacle avoidance for autonomous navigation.
Exposure to remote logging, log ingestion, and distributed telemetry aggregation.
Previous experience in early-stage startups or fast-paced hardware/software integration environments.

Responsibilities

Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures.
Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream.
Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy.
Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance.
Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption.
Productionize proven prototypes from Jetpack into Yocto
Apply advanced optimization techniques—including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume