Staff ML Engineer, Inference Platform

General Motors•Warren, MI

65d•Hybrid

About The Position

We are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute platforms for ML workflows. In this role, you'll work closely with ML engineers and researchers to ensure efficient model serving and inference in production, for their workflows such as data mining, labeling, model distillation, simulations and more. This is a high-impact opportunity to influence the future of AI infrastructure at GM. You will play a key role in shaping the architecture, roadmap and user-experience of a robust ML inference service supporting real-time, batch, and experimental inference needs. The ideal candidate brings experience in designing distributed systems for ML, strong problem-solving skills, and a product mindset focused on platform usability and reliability.

Requirements

8+ years of industry experience, with focus on machine learning systems or high performance backend services.
Expertise in either Go, Python, C++ or other relevant coding languages.
Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc).
Strong communication skills and a proven ability to drive cross-functional initiatives.
Experience working with cloud platforms such as GCP, Azure, or AWS.
Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities.

Nice To Haves

Hands-on experience building ML infrastructure platforms for model serving/inference.
Experience working with or designing interfaces, apis and clients for ML workflows.
Experience with Ray framework, and/or vLLM.
Experience with distributed systems, and handling large-scale data processing.
Familiarity with telemetry, and other feedback loops to inform product improvements.
Familiarity with hardware acceleration (GPUs) and optimizations for inference workloads.
Contributions to open-source ML serving frameworks.

Responsibilities

Design and implement core platform backend software components.
Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value.
Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms.
Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services.
Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques.
Lead large-scale technical initiatives across GM's ML ecosystem.
Raise the engineering bar through technical leadership, establishing best practices.
Contribute to open source projects; represent GM in relevant communities.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume