Senior ML Infrastructure Engineer (Compute)

GM•Sunnyvale, CA

19d

About The Position

About the Team: The AI Validation Platform team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous vehicles (L3/L4/L5). Our platform supports the simulated validation of state-of-the-art (SOTA) machine learning models, with a focus on performance, availability, concurrency, and scalability. We enable rapid innovation and development by prioritizing high-impact, ML-centric use cases. About the Role: We are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms for Simulation workflows. In this role, you will focus on scaling, driving efficiency, and high utilization of cutting-edge GPUs, while also leveling up the platform’s reliability. The successful candidate will have experience building and running scalable distributed systems . They will rapidly test and promote ideas, have strong problem-solving skills, and demon strate a bias for action . You will play a key role in shaping the architecture, roadmap, and user experience of a robust service supporting our AI Validation / Simulation needs . The ideal candidate brings experience in designing distributed systems , strong problem-solving skills, and a get-it-done attitude . This is a high-impact opportunity to influence the future of AI infrastructure at GM.

Requirements

4+ years of industry experience, with a focus on high performance backend services.
Strong expertise in Go, or other similar coding languages.
Experience working with cloud platforms such as GCP, Azure, or AWS.
Experience in delivering cross-functional initiatives.
Strong communication skills and a proven ability to drive cross-functional initiatives.
Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities.

Nice To Haves

Hands-on experience with Cloud VM services Google Compute Engine.
Experience with hardware-in-the-loop validation systems.
Experience with high performance computing (HPC).
Experience working with or designing interfaces and clients for developer workflows.
Familiarity with telemetry, and other feedback loops to inform product improvements.
Familiarity with hardware acceleration (GPUs) and optimizations.

Responsibilities

Design and implement core platform backend software components.
Collaborate with Simulation engineers, ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value.
Lead technical decision-making on Compute architecture, cloud capacity provisioning, caching, and auto-scaling mechanisms.
Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization.
Proactively research and integrate frameworks, hardware accelerators, and distributed computing techniques.
Lead large-scale technical initiatives a cross GM’s ML infrastructure.
Raise the engineering bar through technical leadership and by establishing best practices .

Benefits

From day one, we're looking out for your well-being–at work and at home–so you can focus on realizing your ambitions.
Learn how GM supports a rewarding career that rewards you personally by visiting Total Rewards resources .

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume