About The Position

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another. You will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power- and thermal-constrained vehicle SoCs. In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

Requirements

  • Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.
  • Deep expertise in working with real-time systems & required constraints such as processing latency, memory utilization, and memory bandwidth pressure.
  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
  • Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
  • Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

Nice To Haves

  • Prior experience in high-performance robotics applications such as AV/drones/robots.
  • Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
  • Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

Responsibilities

  • Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.
  • Spearhead cross-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.
  • Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
  • Write production-level, low-latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.

Benefits

  • paid time off (e.g. sick leave, vacation, bereavement)
  • unpaid time off
  • health insurance
  • long-term care insurance
  • long-term and short-term disability insurance
  • life insurance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service