About The Position

This role is for a Solution Architect at NVIDIA, joining a team that is revolutionizing AI with data center scale solutions. The position involves designing, building, and maintaining large-scale HPC and AI infrastructure, actively contributing to making AI Factories a reality. Solution Architects work closely with customers and partners to address industry problems, deploying and operationalizing AI solutions at scale. The day-to-day work focuses on enabling partners to successfully adopt end-to-end AI solutions using NVIDIA's compute, networking, and software stacks. Specifically for this role, a deep technical understanding of NVIDIA Reference Architectures is required to enable customers adopting CPU-based solutions within the overall NVIDIA AI Factory. It's a multi-faceted role that requires comfort working with hardware, software, the larger AI workflow, and the operationalization of large-scale compute resources. The goal is to help customers overcome barriers to adopting NVIDIA's best known methods. As the technical leader for CPU components within the NVIDIA AI Factory, this role is instrumental in driving success. The team also emphasizes knowledge sharing through demos, proof-of-concepts, papers, and developer blogs, collaborating with executives and engineering to solve complex problems and bring NVIDIA's technologies to life. The mission is to solve problems that no one else has solved yet.

Requirements

  • Experience with defining, deploying, and testing large scale reference architectures for High Performance Computing and AI
  • A track record of defining and using MLOps and AI workflow tools and processes
  • 6 or more years of hands-on expertise with modern data center architectures and interaction between CPUs, GPUs, and networking
  • Strong foundational expertise and a BS, MS, or equivalent experience in Engineering
  • Strong analytical and problem-solving skills, along with an ability to articulate what you know to others
  • Ability to multitask efficiently in a multifaceted environment
  • Experienced with organizing, presenting, and discussing technical materials with groups that can be comprised of a range of technical capability
  • Flexibility to adapt in fluid situations, especially with partners or customers
  • Comfortable with occasional travel to customer sites

Nice To Haves

  • Hands-on experience with Arm-based server processors and the Arm software ecosystem
  • Proficiency with tooling, automation, and performance testing for large-scale clusters, preferably using AI tools
  • Deep understanding of Agentic AI and inference workflows
  • Experience building, using, and explaining reinforcement learning
  • Willingness and ability to learn quickly as we address sophisticated problems, and an understanding of how all elements of the AI Factory interact with each other

Responsibilities

  • Help partners be successful in their adoption of end-to-end AI solutions using NVIDIA's compute, networking, and software stacks
  • Use a deep technical understanding of NVIDIA Reference Architectures to enable customers adopting CPU-based solutions as part of the overall NVIDIA AI Factory
  • Work on hardware and software elements, the larger AI workflow, and operationalization of large scale compute resources
  • Help customers overcome barriers to adopting NVIDIA's best known methods
  • Act as the technical leader for the CPU components within the NVIDIA AI Factory
  • Share knowledge with colleagues, delivering demos, assisting with proof-of-concepts, or writing papers and developer blogs
  • Collaborate with executives and engineering to tackle sophisticated problems and help bring NVIDIA's premiere technologies to life
  • Solve problems that nobody else has solved yet

Benefits

  • Equity
  • Benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service