Senior Product Manager, AI Inference - Dynamo

NVIDIA•Santa Clara, WA

About The Position

NVIDIA is seeking a highly technical Product Manager to own the evolution of NVIDIA Dynamo, our flagship distributed inference framework. In this role, you will define the roadmap for high-scale LLM and Generative AI serving, bridging the gap between cutting-edge hardware (Vera Rubin, LPU, and NVLink) and software optimizations, like disaggregated serving, KV aware routing, and intelligent KV cache management. We need a self-starter to continue growing the product portfolio and work with the customers to incorporate model evaluation into end-2-end LLM workflows. We're looking for the rare blend of technical and product skills and passion for groundbreaking technology. If this fits, we would love to learn more about you!

Requirements

12+ years demonstrated ability in product management at a technology company, co-founder or related technical role in a startup or equivalent experience.
Bachelors Degree in Computer Science or related field (or equivalent experience).
Proven experience in AI inference, distributed systems, and GPU-accelerated computing.
Deep understanding of the LLM inference lifecycle (Prefill vs. Decode), KV cache mechanics, and distributed serving techniques, like Disaggregated Serving.
Ability to translate low-level technical capabilities into high-level business value (reduced TCO, faster TTFT).
Teamwork and influencing skills to optimally navigate in a highly matrixed environment. At NVIDIA, your entire company is on your team!
Empathy and deep care for your customers to build products people love.
Pragmatic and data-driven project management skills to navigate software development lifecycle requirements, product release schedules, and customer desires and deliver quality software on schedule.

Nice To Haves

Proven track record working with Agentic frameworks (LangChain, NeMo Agents) or building multi-turn, stateful AI applications.
Knowledge of trends around LLMs and Generative AI, Responsible AI, MLOps
Technical background and hands-on experience building AI (and LLM) solutions as an engineer. We expect you to have intuition for ML models and systems evaluation and read relevant research papers to inform your product strategy and roadmap.

Responsibilities

Core Dynamo Architecture: Drive the product strategy for Dynamo’s modular components, including the KV-aware Router, KV Block Manager (KVBM), and communication planes.
Inference Orchestration: Define requirements for sophisticated routing logic that minimizes redundant prefill and optimizes Time to First Token (TTFT) across substantial GPU clusters.
Memory & KV Cache Management: Define strategy for multi-tier KV cache offloading enabling long-context windows and high-concurrency serving without compromising user experience.
Hardware-Software Co-Design: Collaborate with engineering to ensure Dynamo extracts maximum performance from NVIDIA hardware.
Agentic Inference: Develop Agent-first capabilities (e.g. priority, output length, cache pinning) to support sophisticated, multi-turn reasoning.
Ecosystem Integration: Partner with open-source communities, e.g. vLLM, SGLang, TensorRT-LLM, and internal teams (NeMo Agent Toolkit).
Product Leadership: Author product requirements documents (PRDs) and software application designs docs (SADDs). Build for ease-of-use, extensibility, modularity. Work with TPMs to align roadmaps and respond to market trends.