Our team’s mission is to make PyTorch models high-performing, deterministic and stable, via a robust foundational framework that supports the latest hardware, without sacrificing the flexibility and ease of use of PyTorch. We are seeking a PhD Research Intern to work on next-generation Mixture-of-Experts (MoE) systems for PyTorch, focused on substantially improving end-to-end training and inference throughput on modern accelerators (e.g., NVIDIA Hopper and beyond). This internship will explore novel combinations of communication-aware distributed training and kernel- and IO-aware execution optimizations (inspired bySonicMoE and related works) to unlock new performance regimes for large-scale sparse models. The project spans systems research, GPU kernel optimization, and framework optimization, with opportunities for open-source contributions and publication. Team scope: - Improve PyTorch out-of-the-box performance on GPU, CPU, accelerators - Vertical performance optimization for models for training and inference - Model optimization techniques like quantization for improved efficiency - Improve stability and extensibility of the PyTorch framework Our internships are twelve (12) to twenty-four (24) weeks long and we have various start dates throughout the year.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Intern
Education Level
Ph.D. or professional degree