About The Position

Do you thrive on taking a strategic product from launch to go‑to‑market at scale across the world’s largest customers? NVIDIA is looking for an Infrastructure Solutions Architect to lead deployment and bring‑up of our next‑generation Data Center GPUs and networking platforms. As part of the NVIDIA Solutions Architecture team, we navigate uncharted technical and organizational spaces — serving as the bridge between early platform readiness, cloud engineering teams, product strategy, and large‑scale customer deployments. We are looking for Solution Architects to combine hands‑on infrastructure expertise with multi-functional leadership to accelerate adoption of NVIDIA technologies across worldwide cloud hosting providers and large enterprise environments.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or similar, or equivalent experience.
  • 4+ years experience in Solutions Architecture, Infrastructure Engineering, or similar technical roles.
  • Hands‑on experience with bring‑up and validation of large‑scale NVIDIA GPU platforms, including multi‑GPU and multi‑node architectures.
  • Understanding of high‑performance networking technologies (e.g., RDMA, congestion control, high‑bandwidth interconnects) and their role in distributed AI workloads.
  • Familiarity with NVIDIA system software stacks: CUDA, NCCL, NVSwitch/NVLink, driver behavior, and performance tuning.
  • Proficiency with Linux systems tools for identifying issues and evaluating system performance, such as: dmesg, journalctl, lspci, numactl, ethtool, iostat, perf, nvidia-smi, top/htop, ipmitool, container‑level tooling, and related utilities.
  • Understanding of server hardware architecture, including PCIe topologies, system firmware, NUMA, BIOS/UEFI configuration, power/thermal envelopes, and memory/subsystem behavior.
  • Understanding of BMC/IPMI/Redfish for remote management, hardware health monitoring, and out‑of‑band debugging during early‑stage bring‑up.
  • Strong Linux fundamentals across drivers, kernel subsystems, cgroups, containers, and node‑level performance analysis.
  • Ability to identify performance bottlenecks at the cluster, node, accelerator, network, or application layer.

Nice To Haves

  • Outstanding interpersonal skills and the ability to build clarity and direction across diverse, fast paced technical teams.
  • Knowledge of Compute and networking infrastructure (e.g., Instance types, networking primitives, high‑performance communication paths etc) at Hyperscalers or Cloud Service Providers.
  • Demonstrated leadership resolving multi‑team infrastructure challenges across engineering, product, and customer groups.
  • A consistent record of taking GPU or infrastructure products from pilot to high‑volume deployment in large data center environments.
  • Familiarity with modern deep learning, LLM architectures, and distributed training/inference challenges at scale.

Responsibilities

  • Lead end‑to‑end execution for Hyperscaler customers to rapidly bring NVIDIA Data Center GPU and networking platforms to market at scale.
  • Drive strategic partnership and alignment with Product teams to understand roadmap intent, co‑define critical metrics, and ensure unified direction across technical, sales, and leadership organizations.
  • Influence without authority across Product, Engineering, Sales, Operations, and CSP customers, driving clarity, alignment, and unblock paths for scale‑up.
  • Analyze deployment and performance data, identifying product health trends, system bottlenecks, and operational risks.
  • Solve challenging technical problems involving GPUs, networking, drivers, containers, firmware, and distributed system interactions.
  • Deliver streamlined executive‑level communication on status, risks, progress, and required decisions.
  • Collaborate with Product and Engineering, enabling future improvements in platform design, validation, and operational workflows.

Benefits

  • You will also be eligible for equity and benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service