2026 Summer Intern - Machine Learning Engineer, AI Kernels (PhD)

General Motors•Sunnyvale, CA

23d•Hybrid

About The Position

About the team We focus on developing high-performance GPU kernels and custom libraries that power state-of-the-art ML models’ on-vehicle inference. Our charter is to make core AI workloads faster, more reliable, and easier to maintain and deploy. That includes building custom operators when vendor libraries fall short, integrating those kernels into our ML runtime stack, and consulting on performance and CUDA debugging across the AV software stack. We collaborate closely with AI Solutions, AI Compilers, AI Architecture, and AI Tooling to ensure models can be deployed efficiently to the car while meeting strict latency and reliability targets. About the role As an AI Kernels intern, you’ll work alongside experienced kernel, compiler, and performance engineers on real production problems—profiling GPU workloads, experimenting with new kernel implementations, and strengthening the performance and robustness of the AI stack behind GM’s next-generation autonomous and assisted driving features. You’ll design, implement, and benchmark CUDA kernels and supporting infrastructure, contributing to the GPU kernels and custom libraries that improve performance, reliability, and developer experience across our ML stack.

Requirements

Currently enrolled in a PhD program in Computer Science, Computer Engineering, Electrical Engineering, Applied Math / Computational Science or a related STEM field.
Availability to work full-time (40 hours per week) during the internship period.
Demonstrated coursework, research, or projects in GPU programming, parallel computing, high-performance computing (HPC), machine learning systems, or computer architecture.
Strong programming skills in C++

Nice To Haves

Experience with CUDA/CUTLASS/CuTe or other accelerator programming framework, such as OpenCL.
Familiarity with GPU performance profiling tools (e.g., Nsight Systems, Nsight Compute, nvprof).
Experience with mixed-precision computation (FP16 / INT8) and performance–accuracy tradeoffs.
Knowledge of GPU-accelerated libraries (e.g., cub, cuBLAS, cuDNN, TensorRT) and when to use custom kernels vs. library calls.
Background in parallel algorithms, numerical methods, or high-performance computing (HPC).
Prior research, publications, or coursework involving GPU acceleration or systems-level optimization.

Responsibilities

Design and optimize GPU kernels and supporting libraries for core model operations used in on-vehicle inference.
Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backed code.
Help define and refine kernel requirements and priorities by working with partners in AI Solutions, Compilers, and Architecture, and turning them into concrete tasks and project plans.
Implement, benchmark, and iterate on CUDA-based solutions to get the most out of modern GPU hardware for real production workloads.
Take on team-specific projects, which may include performance investigations, reliability improvements, or prototype explorations depending on current priorities.