Principal Machine Learning Engineer

GM•Sunnyvale, CA

22h•$197,600 - $374,200•Hybrid

About The Position

The Role: We are seeking a Principal AI Engineer to lead the design and advancement of our AI platform. You will play a key role in shaping the infrastructure that powers large-scale training and cloud inference. This includes accelerating training throughput, scaling multi-modal models, and enabling the next generation of AI-driven driving systems. We're tackling challenges across distributed training, training efficiency, DDP/FSDP, data processing pipelines, and Pytorch model optimization. This is a highly impactful position where your technical leadership will define how we scale AI to achieve autonomy. What You’ll Do: Architect, build, and optimize core AI/ML platform infrastructure to support massive-scale model training. Collaborate with data scientists, ML engineers, and software developers to enable seamless workflows from research to production. Drive efficiency in large-scale distributed training and data processing pipelines. Establish best practices for reliability, scalability, and performance across the AI/ML platform. Provide technical leadership and mentorship, guiding teams on platform design, architecture decisions, and emerging technologies. Partner with cross-functional stakeholders to align platform capabilities with business needs and strategic AI initiatives.

Requirements

Bachelor’s degree or higher in Computer Science, related field, or equivalent experience.
8+ years of professional software engineering experience.
4+ years of specialized experience in AI/ML domain (e.g., enabling distributed training for large-scale models).
Strong programming skills in Python, with proficiency in frameworks such as PyTorch (preferred) or TensorFlow.
Experience with distributed systems, GPU computing, and cloud environments (AWS, GCP, or Azure).
Comfortable operating in highly ambiguous and dynamic environments.
Willingness to travel to Sunnyvale, CA as needed.

Nice To Haves

Proven track record of self-motivation, execution, and delivering impact.
Deep expertise with PyTorch 2.x+ and distributed training frameworks.
Strong skills in profiling, analysis, debugging, and optimizing training performance (e.g., avoiding memory fragmentation, operation fusion).
Proficiency in C++ for performance-critical components.
Experience leading cross-functional projects and aligning diverse stakeholders on priorities.

Responsibilities

Architect, build, and optimize core AI/ML platform infrastructure to support massive-scale model training.
Collaborate with data scientists, ML engineers, and software developers to enable seamless workflows from research to production.
Drive efficiency in large-scale distributed training and data processing pipelines.
Establish best practices for reliability, scalability, and performance across the AI/ML platform.
Provide technical leadership and mentorship, guiding teams on platform design, architecture decisions, and emerging technologies.
Partner with cross-functional stakeholders to align platform capabilities with business needs and strategic AI initiatives.

Benefits

GM offers a variety of health and wellbeing benefit programs.
Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.
GM offers a variety of health and wellbeing benefit programs.
Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume