Senior Data Center Connectivity Engineer

NVIDIA

About The Position

The connectivity engineer translates product reference architectures and logical network diagrams into physical builds. This applies to NVIDIA's AI Factory build guidelines and NVIDIA's large-scale internal research clusters. This role will act as the lead engineer for all in-cluster cabling, pathway and rack layout optimizations required to power global-scale AI deployments, ensuring the cluster is co-designed with facilities infrastructure (Power&Cooling) and Infrastructure Software. This role provides an outstanding opportunity to be at the forefront of NVIDIA's technology roadmap!

Requirements

Minimum of 12+ years in a connectivity, network architecture or engineering role within a Hyperscale Cloud Provider, large-scale enterprise data center, or High-Performance Computing (HPC) environment.
BA or BS (or equivalent experience).
Consistent record of designing, deploying, and operating network fabrics for thousands of GPU/CPU nodes.
Deep expertise in high-speed interconnect technologies, including InfiniBand, RoCE, and RDMA.
Proven experience designing connectivity solutions for high-density GPU clusters (100kW+ per rack) and understanding the unique front-end and back-end requirements for AI training vs. inference.
Deep understanding of data center infrastructure, including rack power/cooling, cable management, and physical density constraints.
Demonstrated ability to lead multidisciplinary teams and complete sophisticated technical initiatives.

Nice To Haves

Deep expertise with NVIDIA's compute and network product families and deployment standards.
Comfortable operating at the intersection of network engineering, MEP systems, and Infrastructure as a Service software layer.
Experienced with field deployments and/or global reference design documentation, ideally both.

Responsibilities

Own the development of connectivity reference designs based on requirements from cluster architecture, network engineering, infrastructure software and product hardware teams.
Build and develop comprehensive documentation, including detailed rack elevations and network architecture diagrams and cabling point-to-point list.
Support projects throughout design and deployment phases.
Serve as the primary engineering support, closely collaborating with deployment and field teams to ensure successful cluster build-out and operation.
Strategically co-design the cluster with power and cooling infrastructure teams, ensuring a thorough understanding of all facility architectural requirements (Arch, power, cooling).
Work with hardware, network and security teams to translate software stack requirements into physical requirements: hardware selection, fault domain, network architecture.
Develop new solutions and products in the connectivity space to accelerate the deployment of large scale AI Factories