NVIDIA is seeking a motivated Deep Learning engineer to integrate advanced CUDA features and Distributed Runtime technologies into AI stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will join the team responsible for core CUDA features and runtimes for scaling Deep Learning and HPC applications. The role involves addressing diverse multi-GPU demands, from training on scales up to 100K GPUs to inference with microsecond latency. Your work will enhance both the productivity and performance of AI applications, accelerating their adoption by the community. This is a significant opportunity for individuals with an AI background to contribute to state-of-the-art advancements.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior