About The Position

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s motivated by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brain of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As a NVIDIAN, you’ll be immersed in a diverse, inviting environment that encourages everyone to do their best work. Step into the team and explore how you can make a lasting impact on the world. We are looking for a networking professional to join the NVIDIA Solution Architects team. The team supports NVIDIA’s AI factory deployments at various customer sites. Together, we will drive end-to-end integration of technology solutions with some of NVIDIA's most strategic technology customers. You will offer recommendations to customers and partners on our product upgrades. This dynamic role requires excellent social skills to analyze, define, implement, and fix large-scale networking projects with customers and internal teams.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 10+ years of experience in designing, managing, and supporting large-scale hybrid networks.
  • Experience with scripting is helpful.
  • Strong programming skills in at least one of the following languages: C, C++, or Python.
  • Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
  • Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
  • Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters).
  • System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
  • Excellent communication and liaison skills to work with customers, partners, and internal functions.

Nice To Haves

  • Systems engineering, coding, and debugging skills, including experience C/C++, Linux kernel, and drivers
  • Hands-on experience with NVIDIA systems/SDKs (e.g. CUDA), NVIDIA Networking technologies (e.g., DPU, RoCE, InfiniBand), and/or ARM CPU solutions
  • Hands-on experience in the Linux Environment and software-defined networking.
  • Experience with system board architectures and familiarity with x56, 64-bit, and low-level hardware programming.

Responsibilities

  • Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms.
  • Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress.
  • Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams.
  • Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions.
  • Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs.
  • Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.

Benefits

  • NVIDIA offers highly competitive salaries and a comprehensive benefits package.
  • As you plan your future, see what we can offer to you and your family at www.nvidiabenefits.com/
  • You will also be eligible for equity and benefits .

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Ph.D. or professional degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service