Principal Deep Learning Communication Architect

NVIDIA•Austin, TX

About The Position

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

Requirements

Ph.D. or M.S. in Computer Science, Electrical Engineering, or a related field (or equivalent experience), with 12+ years of industry experience in high-performance computing (HPC) or distributed deep learning.
Parallelism Expertise: Deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants.
Technical Proficiency: Deep technical proficiency with NCCL, UCX, UCC, NVSHMEM, or MPI. Experience with RDMA, RoCE, and low-level InfiniBand verbs is required.
Inference & Serving: Advanced knowledge of high-throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo.
GPU Architecture: Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache) and CUDA programming models.

Nice To Haves

Framework Development: Hands-on experience developing within Megatron-Core, DeepSpeed, or JAX/XLA, with an understanding of how these frameworks interact with low-level communication runtimes is a plus.
Significant upstream contributions to major open-source projects (e.g., PyTorch Distributed, KServe, or Ray).
A proven track record of deploying and optimizing models on NVIDIA platforms or similar rack-scale systems.
A strong portfolio of patents or papers in top-tier systems/architecture venues (e.g., ISCA, ASPLOS, NeurIPS, SC).

Responsibilities

Architecture Leadership: Define the long-term technical roadmap for communication libraries across NVIDIA’s next-generation platforms. You will ensure the seamless scaling of models to clusters comprising hundreds of thousands of nodes.
AI Communication Library Design: Lead the development of next-generation communication primitives and collective algorithms. This includes optimizing for heterogeneous interconnects such as NVLink, Spectrum-X (Ethernet), and Quantum-X (InfiniBand).
Application- Communication Library Co-Design: Partner with application developers to architect and implement specialized communication primitives. You will ensure that AI and HPC libraries—including NCCL, NIXL, NVSHMEM, UCC, and UCX—evolve to meet the requirements of trillion-parameter and Agentic AI.
Hardware/Software Co-Design: Collaborate with silicon Aarchitects and software engineers to influence hardware specifications for next-generation networking, ensuring they meet the evolving demands of trillion-parameter LLMs and Agentic AI.
Quantitative Modeling: Develop high-fidelity analytical models and simulators to predict system behavior under emerging workloads.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume