Senior Software Engineer – PyTorch and AI Frameworks

NVIDIA•Santa Clara, CA

About The Position

We are looking for experienced engineers to help build and scale next-generation AI infrastructure using PyTorch, one of the world’s most widely used deep learning frameworks. This role sits at the intersection of machine learning systems, compilers, and high-performance computing, enabling researchers and product teams to train and deploy large-scale models efficiently. You will work on core components of the PyTorch ecosystem, including model execution, distributed training, performance optimization, and developer experience.

Requirements

PhD or MSc degree in Computer Science, Applied Math, Physics, or related science or engineering field (or equivalent experience)
8+ years of software development experience
Strong programming skills in C++ and Python
Deep understanding of deep learning frameworks, preferably PyTorch
Experience with GPU programming (CUDA or similar) and performance optimization

Nice To Haves

Contributions to PyTorch core or ecosystem libraries
Experience with NVIDIA AI stack (TensorRT, Triton Inference Server, cuBLAS, cuDNN, NCCL)
Familiarity with ML compilers (TorchInductor, Triton, XLA, TVM)
Experience optimizing LLMs or large-scale recommendation / vision models
Background working closely with hardware-aware software optimization

Responsibilities

Design and build core PyTorch capabilities across runtime, autograd, distributed training, and model execution
Optimize performance across GPU/accelerator backends (CUDA, Triton, etc.)
Contribute to or lead development of large-scale ML systems and infrastructure
Improve model training efficiency, scalability, and reliability across multi-node environments
Work on compilers / graph transformations / kernel optimizations to accelerate deep learning workloads
Partner with researchers and applied teams to translate cutting-edge models into production systems
Drive open-source contributions and collaborate with the broader PyTorch community
Influence roadmap and architecture for next-gen AI platforms
Work at the forefront of AI and accelerated computing
Direct impact on how PyTorch runs on the world’s most advanced GPU platforms
Collaborate across hardware, systems software, and AI research to push performance boundaries and enable breakthroughs in generative AI, autonomous systems, and high-performance computing