Nvidia-posted 30 days ago
Full-time • Mid Level
Santa Clara, CA
5,001-10,000 employees
Computer and Electronic Product Manufacturing

We're forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA's GPU technology and Kubernetes. As a Solutions Architect (Inference Focus), you'll collaborate closely with our engineering, DevOps, and customer success teams to foster enterprise AI adoption. Together, we'll introduce generative AI to production!

  • Help customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on Kubernetes for large language models (LLMs) and generative AI workloads.
  • Enhance performance tuning using TensorRT/TensorRT-LLM, NVIDIA NIM, and Triton Inference Server to improve GPU utilization and model efficiency.
  • Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to customers implementing AI at scale.
  • Architect zero-downtime deployments, autoscaling (e.g., HPA or equivalent experience with custom metrics), and integration with cloud-native tools (e.g., OpenTelemetry, Prometheus, Grafana).
  • 5+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production on Kubernetes.
  • Experience architecting GPU allocation using NVIDIA GPU Operator and NVIDIA NIM Operator. Troubleshoot sophisticated GPU orchestration, optimize with Multi-Instance GPU (MIG), and ensure efficient utilization in Kubernetes environments.
  • Proficiency with TensorRT-LLM, Triton, and TensorRT for model optimization and serving.
  • Success stories optimizing LLMs for low-latency inference in enterprise environments.
  • BS or equivalent experience in CS/Engineering.
  • Prior experience deploying NVIDIA NIM microservices for multi-model inference.
  • Serverless Inference, knowledge of FaaS patterns (e.g., Google Cloud Run, AWS Lambda, NVCF) with NVIDIA GPUs.
  • NVIDIA Certified AI Engineer or similar.
  • Active contributions to Kubernetes SIGs or AI inference projects (e.g., KServe, Dynamo, SGLang or similar).
  • Familiarity with networking concepts which support multi-node inference such as MPI, LWS or similar.
  • You will also be eligible for equity and benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service