Senior ML Inference Engineer - Platform

General MotorsWashington, DC
Remote

About The Position

The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks (e.g. PyTorch) onto autonomous vehicle hardware. Our mission is two-fold: build the ML deployment platform that makes model rollouts fast and predictable, and optimize models so they meet the real-time latency and memory budgets required to run on-vehicle. Our work is on the critical path of GM's publicly committed launch of eyes-off (hands-free, eyes-off) autonomous driving in 2028, debuting on the Cadillac Escalade IQ, building on Super Cruise's billion-plus hands-free miles. This role sits in the team's Platform pillar. We own the unified ML deployment platform that automates the path from a trained model to inference on the vehicle, along with the developer-experience and agentic-tooling layer that makes deployment self-serve for every ML model development team at GM.

Requirements

  • BS, MS, or PhD in Computer Science or a related technical field.
  • 3+ years of relevant industry experience.
  • Strong fundamentals and excellent coding ability in Python.
  • Experience building or operating production platform or infrastructure systems where reliability, observability, and extensibility matter.
  • Experience with ML model deployment, inference integration, model optimization workflows, or model serving infrastructure, with at least one prior context where you owned the path from a trained model to a running inference workload.
  • Experience using coding agents (Cursor, Claude Code, GitHub Copilot, or equivalent) as part of your engineering workflow.
  • Experience designing clean, well-tested software with clear interfaces and good abstractions.
  • Strong cross-team collaboration skills.

Nice To Haves

  • Experience building agentic or LLM-powered developer tooling.
  • Experience with ML or workflow orchestration frameworks (Airflow, Temporal, Flyte, Ray, Kubeflow, or equivalent).
  • Familiarity with the NVIDIA GPU stack at the integration level (CUDA-aware Python, TensorRT, Triton inference server, torch.compile, ONNX).
  • Experience with inference-serving frameworks (Triton, TorchServe, Ray Serve, vLLM) or edge-deployment toolchains.
  • Experience with low-latency or real-time systems.
  • Experience in autonomous vehicles, robotics, or other safety-critical ML deployment domains.
  • Open-source contributions to PyTorch, Ray, Airflow, Temporal, vLLM, TensorRT, or related projects.
  • 3+ years of relevant industry experience.

Responsibilities

  • Design, build, and operate the ML deployment platform that automates the path from trained model to on-vehicle inference.
  • Drive cross-organization model deployments to the autonomous vehicle stack, partnering with model development teams to take high-value models from training to production on-vehicle.
  • Build agentic tools that diagnose and fix deployment-blocking issues, automating workflows currently performed manually by engineers.
  • Build the developer experience that ML model development teams use day to day: tooling, dashboards, automation, and observability.
  • Drive shift-left validation that surfaces deployment risk (compile, runtime, parity, latency) early in the model development cycle.
  • Build platform tools that integrate the work of our sister teams (kernels, compiler, reduced precision and parity) so their optimization wins land directly in the deployment workflow.
  • Partner with the team's Performance pillar and model development teams across the AV organization.

Benefits

  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service