About The Position

We are looking for a networking professional to join the NVIDIA Solutions Architects team! The team supports NVIDIA’s AI factory deployments at various customer sites. Together, we will drive end-to-end integration of technology solutions with some of NVIDIA's most strategic technology customers. You will offer recommendations to customers and partners on our product upgrades. This dynamic role requires excellent communication skills to analyze, define, implement, and fix large-scale networking projects with customers and internal teams. What you'll be doing: Deploy, lead, and maintain large-scale AI Data Centers - control, network, and storage stack. Tackling network issues is a big task of this role. Identifying hardware issues and supervising them through bugs while keeping customers updated on the current progress. Build high-performance DC fabrics using InfiniBand and high-throughput Ethernet (RoCE and traditional IP). These fabrics support general compute workloads and GPU-dense AI/ML training and inference environments. Implement networking solutions, such as Spectrum switch, ConnectX network adapter, and Bluefield DPU. Educate customers on future-proofing the infrastructure, help drive proof-of-concept (POCs). Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 8+ years of experience in designing, managing, and supporting large-scale hybrid networks.
  • Experience with scripting is helpful.
  • Expert in networking technologies: TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS, IS-IS, DWDM.
  • Experience automating SDN/NFV/NFVI Infrastructure
  • System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
  • Superb communication and liaison skills to work with customers, partners, and internal functions.

Nice To Haves

  • Advanced-level experience with Cisco / Arista / Juniper is a huge plus.
  • Cisco CCIE (routing and switching) + Fabric manager understanding.
  • Hands-on experience in the Linux Environment and software-defined networking.
  • Working knowledge of Infiniband and storage HBA.

Responsibilities

  • Deploy, lead, and maintain large-scale AI Data Centers - control, network, and storage stack.
  • Tackling network issues is a big task of this role.
  • Identifying hardware issues and supervising them through bugs while keeping customers updated on the current progress.
  • Build high-performance DC fabrics using InfiniBand and high-throughput Ethernet (RoCE and traditional IP). These fabrics support general compute workloads and GPU-dense AI/ML training and inference environments.
  • Implement networking solutions, such as Spectrum switch, ConnectX network adapter, and Bluefield DPU.
  • Educate customers on future-proofing the infrastructure, help drive proof-of-concept (POCs).
  • Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service