About The Position

NVIDIA is looking for an experienced Senior Software Engineer to expand the US-based Networking Hyperscale Engineering Team. Are you craving an opportunity to work directly with top-tier cloud and AI customers, co-develop software that powers their AI superclusters, and influence NVIDIA’s NIC software roadmap? In this role you will do just that for NVIDIA’s high-performance networking stack spanning Linux kernel, RDMA/RoCE, DPDK, DOCA, NCCL, and NIC firmware. You will be among the first to design and optimize the NIC and communication paths for our next-generation GPU and NIC platforms and help define their role in the modern AI data center. You’ll work closely with some of the best SDK, driver, firmware, and GPU/NIC architects in the industry, as well as domain experts in large-scale training, collectives, and systems performance.

Requirements

  • 12+ years overall experience in a similar or related systems / networking software role.
  • A Bachelor’s, Master’s or PhD in Software Engineering, Computer Science, Computer Engineering, Electrical Engineering, or a related field (or equivalent experience).
  • Deep C/C++ expertise, strong Linux systems knowledge, and hands-on experience with kernel networking / RDMA / NIC drivers or DPDK.
  • Proven experience developing and debugging network operating systems (NOS) and routing/switching protocols used in AI data centers (for example BGP, ECMP, EVPN/VXLAN).
  • Practical experience with DOCA, NIC firmware interfaces, or other hardware-accelerated networking stacks for large-scale systems.
  • Excellent communication skills and a track record of effective collaboration with developers, partners, and customers in dynamic environments.

Nice To Haves

  • Deep knowledge of Linux kernel / systems internals, SoC / SmartNIC / NIC embedded systems, and data center switches and NOS.
  • Hands-on experience with RDMA/RoCE, GPU-related networking (for example GPUDirect RDMA), and high-performance, low-latency data paths.
  • Background optimizing NCCL or other distributed training stacks on large GPU clusters for throughput and tail latency.
  • Experience working with hyperscalers or major cloud providers on strategic, performance-critical AI networking deployments.
  • Contributions to open-source networking, RDMA, DPDK, kernel, CUDA/NCCL, or related ecosystems.

Responsibilities

  • Co-developing NIC software and communication paths with strategic, top-tier customers to enable and scale large AI superclusters.
  • Designing and implementing high‑performance C/C++ components on Linux using DPDK, kernel-bypass techniques, and RDMA/RoCE.
  • Developing and integrating kernel, driver, and NIC firmware features to improve throughput, latency, and reliability for AI workloads.
  • Working closely with NCCL and distributed training teams to tune end-to-end collectives performance over NVIDIA networking at scale.
  • Owning complex performance and functionality debug with customers and representing the team in cross-org architecture discussions.

Benefits

  • competitive salaries
  • generous benefits package
  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service