Senior Network Architect

NVIDIASanta Clara, CA
1dHybrid

About The Position

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for 30 years. It’s an outstanding legacy of innovation that’s motivated by extraordinary technology—and outstanding people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join our team and see how you can make a lasting impact on the world! The NVIDIA Enterprise Network Architecture team is seeking experienced candidates in the extensive domain of network architecture & engineering. This is a hands-on architecture position focused on the development and deployment of ultra-high-speed, resilient, and scalable interconnects for GPU-accelerated data centers and compute clusters. Outstanding problem-solving abilities and a comprehensive understanding of the network security protocols & standards, routing, switching, automation and deep understanding of fundamental network theory is also critical to your success at NVIDIA.

Requirements

  • MS or PhD in Electrical Engineering, Computer Science, Computer Engineering, Artificial Intelligence, Data Science, Mathematics, Statistics, or equivalent experience.
  • 12+ years of experience in building, managing and supporting large scale hybrid networks, developing automation pipelines with Python, Ruby, Go or other languages used in infrastructure automation.
  • Expert in networking technologies: TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS, IS-IS, DWDM.
  • Experience automating SDN/NFV/NFVI Infrastructure

Responsibilities

  • Lead the architecture, design, and deployment of global‑scale backbone and data center fabrics that serve large fleets of CPU‑based compute, storage, and GPU/HPC clusters.
  • Design high‑performance DC fabrics using InfiniBand and high‑throughput Ethernet (RoCE and traditional IP) to support both general compute workloads and GPU‑dense AI/ML training and inference environments.
  • Engineer and optimize carrier interconnects, metro and long‑haul backbone, and dark‑fiber systems to provide low‑latency, loss‑minimal connectivity between regions, super labs, and data centers.
  • Partner with systems, OS, GPU, storage, and HPC platform teams to deliver scalable, highly available network architectures that can evolve with rapid growth in both compute and GPU capacity.
  • Implement and refine network monitoring, rich telemetry, and performance‑engineering practices across fabrics and backbone to detect issues early and continually improve end-to-end application experience.
  • Drive technology selection, vendor engagement, and lifecycle strategy for routing, optical, and data center switching platforms across both compute and GPU domains.
  • Define and enforce security, compliance, and reliability standards for all backbone and fabric components supporting sensitive enterprise and R&D workloads.
  • Collaborate with internal product and engineering teams to develop “NVIDIA on NVIDIA” reference architectures and best‑practice solutions for large‑scale compute and AI data centers.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service