Principal System Cloud Architect

NVIDIA
64d$272,000 - $425,500

About The Position

We are seeking a Principal System Architect to lead the architectural vision for NVIDIA’s GPU Infrastructure-as-a-Service (IaaS) offerings. This strategic role focuses on defining reference architectures and system blueprints that integrate NVIDIA’s latest innovations — including GB200 Grace Blackwell systems, Spectrum-X, Bluefield, InfiniBand, Storage (Block, File, Object), and AI Enterprise software stacks — into scalable, high-performance cloud infrastructure for on-prem, Neo Clouds, and CSPs. This role requires deep engagements across hardware, networking, orchestration, and partner ecosystems to define the future of GPU cloud services.

Requirements

  • 15+ years in system architecture, with deep experience in cloud-scale infrastructure, HPC, or AI platforms.
  • Proven expertise in GPU platforms, data center networking (InfiniBand, RoCE, Spectrum), virtual networking, storage, and orchestration technologies.
  • Strong understanding of Kubernetes, VM provisioning, bare-metal provisioning, and infrastructure automation.
  • MS or PhD in Computer Engineering, Electrical Engineering, related field, or equivalent experience.
  • Demonstrated ability to define, document, and present architectural designs and influence cross-functional teams.

Nice To Haves

  • Experience with NVIDIA technologies such as DGX, HGX, GB200, NVLink, NVSwitch, BlueField, Magnum IO, and Spectrum-X.
  • Deep knowledge of AI/ML workloads, distributed training architectures, and GPU scheduler integration.
  • Familiarity with CSP environments (AWS, Azure, OCI, GCP) and hybrid/multi-cloud architectures.
  • Participation in open standards and industry bodies (e.g., OCP, CNCF, Kubernetes SIGs).

Responsibilities

  • Define scalable, secure, and efficient architectures for GPU-based IaaS using NVIDIA’s full stack: DGX/HGX, GB200, NVLink/NVSwitch, InfiniBand, and Spectrum-X.
  • Work with internal engineering, cloud partners, and OEMs to define and publish validated reference architectures covering bare-metal provisioning, virtualization, storage fabrics, and networking.
  • Architect solutions for bare-metal-as-a-service, VMaaS, and container orchestration (Kubernetes), integrated with virtual networking (VPCs), Infiniband fabrics, high-performance storage, and AI workloads.
  • Partner with silicon, platform, networking, and software teams to ensure alignment of architecture with NVIDIA’s roadmap for GPU, DPU, and AI services.
  • Represent NVIDIA in joint solution development with CSPs, OEMs, and hyperscale customers to align infrastructure strategies and deployment practices.
  • Make high-impact architectural decisions across performance, scalability, multi-tenancy, power efficiency, and manageability.

Benefits

  • Competitive salaries
  • Generous benefits package
  • Equity eligibility
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service