Solutions Architect, AI Infrastructure

NVIDIA•Toronto, ON

13h•Hybrid

About The Position

NVIDIA is seeking an experienced AI Infrastructure Solutions Architect (SA), bridging design to deployment of large-scale GPU infrastructure. As part of the NVIDIA SA organization, you will be interacting with customers, partners, and internal teams to analyse, define, and implement large-scale AI/HPC projects, as well as offering recommendations to business and engineering teams on our product roadmap.

Requirements

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, Mathematics, or other Engineering fields or equivalent experience.
5+ years of Solution Engineering (or similar Sales Engineering, Cloud Engineering, Solution Architecture) including experience working directly with partners and customers.
System level expertise of CPU/GPU server architecture, NICs, Linux, system software and kernel drivers.
Experience with networking switches for Ethernet/Infiniband, and Data Center infrastructure (power/cooling).
Knowledge of DevOps/MLOps technologies such as Docker/containers, Kubernetes.
Efficient time management and capable of balancing multiple tasks.
Excellent presentation, communication and collaboration skills.
Self-starter with a passion for growth, continuous learning, and sharing insights.

Nice To Haves

Familiarity with NVIDIA GPUs, NVIDIA Networking technologies (e.g. NICs, RoCE, InfiniBand), and systems technology such as NCCL, DCGM, UFM, Mission Control, and Base Command Manager.
Experience with bringup and deployment of large GPU clusters, including deploying and optimizing high-speed networks (InfiniBand/Ethernet), with a clear understanding of how network architecture impacts GPU cluster performance.
Systems engineering, coding, and debugging skills including experience with C/C++, Linux kernel and drivers.
Experience working with enterprise developers and strong customer-facing skills.

Responsibilities

Working with NVIDIA Cloud Partners in Canada on large data center GPU server and networking system deployments.
Guide customer discussions on network design, compute/storage, and support bring up of server/network/cluster deployments.
You will need to visit customer data center during bring up phase.
Become the primary technical driver for customers during the design, development, construction, integration, and production of GPU Cloud infrastructure and applications throughout the entire customer lifecycle.
Work as the customer's trusted advisor conducting regular technical customer meetings for product roadmap, cluster issue debugging, feature discussions and introduction to new technology solutions.
Partner with other SAs, Account Managers, Engineering, Product, and business leaders to align on strategies, assess technical needs, and secure business opportunities for NVIDIA.
Analyze and debug compute/network configuration and performance issues to deliver performant clusters.
Prepare and deliver technical content to customers including presentations, workshops, reference architectures, tutorials, publications.

Benefits

highly competitive salaries
comprehensive benefits package
equity

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Solutions Architect, AI Infrastructure

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Similar Solutions Architect, AI Infrastructure job opportunities

Tools

Templates & Examples

Resources

Comparisons

Company