NVIDIA seeks a senior software engineer to join the AI Networking co-design and benchmark R&D team. In this pivotal role, the candidate is responsible for building and productizing machine learning tools. These include tools that use ML-based combinatorial optimization and build space exploration (DSE) techniques. These tools will be employed to optimize AI workloads across large GPU and CPU clusters, thereby ensuring the most efficient and productive utilization of system resources at data center scale. The role involves working on distributed Deep Learning, particularly within LLM training and inference stacks. A strong passion for collective communication and networking is desirable. The candidate will interact with diverse hardware and platforms, such as Host Channel Adapters (HCAs), Switches, CPUs, GPUs, and complete Systems. Furthermore, the role requires engagement across multiple software layers, including LLM applications, machine learning frameworks, and communication and computing libraries. The candidate will develop tools and methodologies using Machine Learning (ML) for comprehensive performance analysis and optimization, potentially incorporating learning-based agentic techniques. This work involves deep-diving across the software stack, from LLM applications and ML frameworks down to communication and computing libraries. This position offers a distinct opportunity to make significant contributions to the core infrastructure powering the next generation of large-scale AI systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior