Senior Solutions Architect - Networking

Nvidia
86d$184,000 - $287,500Remote

About The Position

NVIDIA is looking for Senior Networking (ETH/IB) Solutions Architect to join its NVIDIA CSP Networking SA team. CSPs and Hyperscalers around the world are using NVIDIA products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

Requirements

  • BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.
  • 8+ years of professional experience in networking fundamentals, TCP/IP stack, InfiniBand fundamentals and data center architecture.
  • Proficiency in configuring, testing, validating, and resolving issues in Ethernet and InfiniBand networks, especially in medium to large-scale HPC/AI environments.
  • Advanced knowledge of HPC/AI networking protocols.
  • Hands-on experience with network switch/router platforms like Cumulus Linux, SONiC, IOS, JunosOS, and EOS.
  • Strong focus on customer needs and satisfaction.
  • Self-motivated with leadership skills to work collaboratively with customers and internal teams.
  • Strong written, verbal, and listening skills are essential.

Nice To Haves

  • Familiarity with cloud networks (AWS, GCP, Azure) is a plus.
  • Linux or Networking Certifications.
  • Knowledge in link level performance and diagnostics.
  • Experience with High-performance computing architectures.
  • Experience with GPU (Graphics Processing Unit) focused hardware/software.

Responsibilities

  • Building AI/HPC infrastructure for a large CSP customer and their end users.
  • Supporting operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.
  • Engaging in and improving the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
  • Maintaining services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Providing feedback to internal teams such as opening bugs, documenting workarounds, driving customer feature requirements and suggesting improvements.

Benefits

  • Equity and benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Computer and Electronic Product Manufacturing

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service