Senior ML Inference Engineer - Platform

GM•Sunnyvale, CA

1d•$128,700 - $261,300•Remote

About The Position

The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks (e.g., PyTorch) onto autonomous vehicle hardware. This role is within the Platform pillar, responsible for the unified ML deployment platform that automates the process from a trained model to inference on the vehicle. It also includes the developer-experience and agentic-tooling layer that makes deployment self-serve for all ML model development teams at GM. This work is critical for GM's autonomous driving launch in 2028.

Requirements

BS, MS, or PhD in Computer Science or a related technical field.
3+ years of relevant industry experience.
Strong fundamentals and excellent coding ability in Python.
Experience building or operating production platform or infrastructure systems where reliability, observability, and extensibility matter.
Experience with ML model deployment, inference integration, model optimization workflows, or model serving infrastructure, with at least one prior context where you owned the path from a trained model to a running inference workload.
Experience using coding agents (Cursor, Claude Code, GitHub Copilot, or equivalent) as part of your engineering workflow.
Experience designing clean, well-tested software with clear interfaces and good abstractions.
Strong cross-team collaboration skills.

Nice To Haves

Experience building agentic or LLM-powered developer tooling.
Experience with ML or workflow orchestration frameworks (Airflow, Temporal, Flyte, Ray, Kubeflow, or equivalent).
Familiarity with the NVIDIA GPU stack at the integration level (CUDA-aware Python, TensorRT, Triton inference server, torch.compile, ONNX).
Experience with inference-serving frameworks (Triton, TorchServe, Ray Serve, vLLM) or edge-deployment toolchains.
Experience with low-latency or real-time systems.
Experience in autonomous vehicles, robotics, or other safety-critical ML deployment domains.
Open-source contributions to PyTorch, Ray, Airflow, Temporal, vLLM, TensorRT, or related projects.
3+ years of relevant industry experience.

Responsibilities

Design, build, and operate the ML deployment platform that automates the path from trained model to on-vehicle inference.
Drive cross-organization model deployments to the autonomous vehicle stack, partnering with model development teams to take high-value models from training to production on-vehicle.
Build agentic tools that diagnose and fix deployment-blocking issues, automating workflows currently performed manually by engineers.
Build the developer experience that ML model development teams use day to day: tooling, dashboards, automation, and observability.
Drive shift-left validation that surfaces deployment risk (compile, runtime, parity, latency) early in the model development cycle.
Build platform tools that integrate the work of our sister teams (kernels, compiler, reduced precision and parity) so their optimization wins land directly in the deployment workflow.
Partner with the team's Performance pillar and model development teams across the AV organization.