Senior Research Scientist, NLP Systems

NVIDIA•Santa Clara, CA

About The Position

We are looking for Research Scientists passionate about systems for large generative models! We are searching for world-class researchers in systems for deep learning to join our applied deep learning research team which pioneered Megatron and MT-NLG. Our team designs and builds the systems used to train and deploy state-of-the-art large language and multimodal foundation models by employing techniques from both supercomputing and distributed computing. We collaborate with both internal and external partners, including hardware and software teams, to train and serve some of the world’s largest and best foundation models as efficiently as possible. If you are passionate about the latest research and technologies revolutionizing generative AI and want to explore creative new paradigms for training and serving models on the latest and best compute platforms and clusters available, this team will be a great fit for you. What you'll be doing: Tackle large-scale distributed systems capable of performing end-to-end AI training and inference. Develop, design, and implement algorithms to accelerate training and inference of large language models (LLMs) and large multi-modal models. Work with deep learning hardware teams to co-design hardware and software for future model architectures. Work closely with product and hardware architecture teams to integrate research and developments into products. Contribute to open-source software and publish at top conferences.

Requirements

PhD in Electrical Engineering, Computer Science / Engineering, Computational Science or a related STEM field (or equivalent experience).
4+ years of relevant work experience, ideally with extensive research or work experience building large-scale systems for deep learning.
Thorough understanding of compute system concepts (latency/throughput bottlenecks, pipelining, multiprocessing, etc.) and related efficiency analysis and tuning.
Knowledge of machine learning / deep learning concepts.
Excellent programming skills in some rapid prototyping environments such as Python, C++ and parallel programming (e.g., CUDA).
Expertise with deep learning frameworks such as PyTorch.

Responsibilities

Tackle large-scale distributed systems capable of performing end-to-end AI training and inference.
Develop, design, and implement algorithms to accelerate training and inference of large language models (LLMs) and large multi-modal models.
Work with deep learning hardware teams to co-design hardware and software for future model architectures.
Work closely with product and hardware architecture teams to integrate research and developments into products.
Contribute to open-source software and publish at top conferences.