Senior ML Inference Engineer - Platform

General Motors•Washington, DC

1d•Remote

About The Position

The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks (e.g. PyTorch) onto autonomous vehicle hardware. Our mission is two-fold: build the ML deployment platform that makes model rollouts fast and predictable, and optimize models so they meet the real-time latency and memory budgets required to run on-vehicle. Our work is on the critical path of GM's publicly committed launch of eyes-off (hands-free, eyes-off) autonomous driving in 2028, debuting on the Cadillac Escalade IQ, building on Super Cruise's billion-plus hands-free miles. This role sits in the team's Platform pillar. We own the unified ML deployment platform that automates the path from a trained model to inference on the vehicle, along with the developer-experience and agentic-tooling layer that makes deployment self-serve for every ML model development team at GM.

Requirements

BS, MS, or PhD in Computer Science or a related technical field.
3+ years of relevant industry experience.
Strong fundamentals and excellent coding ability in Python.
Experience building or operating production platform or infrastructure systems where reliability, observability, and extensibility matter.
Experience with ML model deployment, inference integration, model optimization workflows, or model serving infrastructure, with at least one prior context where you owned the path from a trained model to a running inference workload.
Experience using coding agents (Cursor, Claude Code, GitHub Copilot, or equivalent) as part of your engineering workflow.
Experience designing clean, well-tested software with clear interfaces and good abstractions.
Strong cross-team collaboration skills.

Nice To Haves

Experience building agentic or LLM-powered developer tooling.
Experience with ML or workflow orchestration frameworks (Airflow, Temporal, Flyte, Ray, Kubeflow, or equivalent).
Familiarity with the NVIDIA GPU stack at the integration level (CUDA-aware Python, TensorRT, Triton inference server, torch.compile, ONNX).
Experience with inference-serving frameworks (Triton, TorchServe, Ray Serve, vLLM) or edge-deployment toolchains.
Experience with low-latency or real-time systems.
Experience in autonomous vehicles, robotics, or other safety-critical ML deployment domains.
Open-source contributions to PyTorch, Ray, Airflow, Temporal, vLLM, TensorRT, or related projects.
3+ years of relevant industry experience.

Responsibilities

Design, build, and operate the ML deployment platform that automates the path from trained model to on-vehicle inference.
Drive cross-organization model deployments to the autonomous vehicle stack, partnering with model development teams to take high-value models from training to production on-vehicle.
Build agentic tools that diagnose and fix deployment-blocking issues, automating workflows currently performed manually by engineers.
Build the developer experience that ML model development teams use day to day: tooling, dashboards, automation, and observability.
Drive shift-left validation that surfaces deployment risk (compile, runtime, parity, latency) early in the model development cycle.
Build platform tools that integrate the work of our sister teams (kernels, compiler, reduced precision and parity) so their optimization wins land directly in the deployment workflow.
Partner with the team's Performance pillar and model development teams across the AV organization.