We are seeking a Member of Technical Staff, Machine Learning Kernels to design, optimize, and benchmark high-performance compute kernels for modern machine learning workloads. This role is for a deeply technical engineer who enjoys working close to hardware — writing CUDA kernels, investigating subtle performance artifacts, building benchmarks, and serving as a go-to expert on accelerator behavior. You will act as a hands-on performance specialist, partnering closely with research, systems, and infrastructure teams to unlock efficiency gains across GPUs today and other accelerators (e.g., TPU, Trainium) as we expand our hardware partnerships. This role will be performed onsite from one of our offices in Santa Clara, CA or Boston, MA.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed