Senior Product Manager, Software & Developer Platform

quadric, Inc•Burlingame, CA

56d•$200,000 - $250,000•Onsite

About The Position

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code. Quadric is seeking a Senior Principal Product Manager to own the software roadmap for the Chimera Graph Compiler (CGC) — the developer-facing platform customers live in from pre-silicon through production. This role drives the monthly SDK release train, sets pattern coverage and quantization strategy, and works directly with anchor customers to convert model gaps into engineering roadmap. You'll partner with the CPO on strategy and with SW engineering on execution. This role is based in Burlingame (on-site), with quarterly travel to Japan, U.S. East Coast, and customer SW teams worldwide.

Requirements

Domain — non-negotiable. Shipped product on at least one of: NPU or AI accelerator IP/silicon stack; graph or ML compiler (TVM, MLIR, XLA, or proprietary); developer-facing AI inference runtime or agent framework. "Adjacent" does not count.
Modern AI workload fluency — non-negotiable. Ready conversation, no prep, on: agentic workflows and LLM serving, KV cache optimization, quantization schemes (AWQ, GPTQ, SmoothQuant, QAT vs. PTQ), datatypes (INT4, FP8, BF16, OCP MX), and inference platforms (vLLM, llama.cpp, TensorRT-LLM, ExecuTorch, ORT).
Shipping bar — non-negotiable. You shipped a developer-facing AI or compute product — SDK, runtime, compiler, or inference service — with real users and a release cadence you owned.
Agent-pilled — non-negotiable. You use agentic AI tools daily (Claude Code, Cursor, or equivalent) to produce work. Having read about agentic AI without integrating it is not sufficient.
Customer first — non-negotiable. When engineering wants to build the elegant thing and the customer needs the workable thing, you take the workable thing every time.
5+ years in PM, with 3+ years on a developer-facing AI/ML or compute platform
Owned a release cadence — picked what ships, what slips, and defended the call
Experience with in-person technical customer reviews
Bay Area resident or willing to relocate to Burlingame

Nice To Haves

ML background: graduate degree, published work, trained model, or OSS contribution
Automotive Tier 1 engagement; ISO 26262 awareness
Prior product work on a competing NPU/GPU/AI accelerator stack (MetaWare, Arm ML, Ceva, TensorRT, Hailo, Tenstorrent, etc.)
OSS contributions to vLLM, llama.cpp, TVM, MLIR, ONNX Runtime, or ExecuTorch

Responsibilities

Software release train. Own the monthly SDK release and quarterly major: contents, release sync, go/no-go, release notes, and customer communications.
Pattern coverage roadmap. Decide which graph patterns CGC compiles next — attention variants, quantization schemes, normalization patterns — and sequence them against customer model requirements each quarter.
Market-driven demo strategy. Lead with the market story and push it through every layer: demo, model zoo, pattern coverage, compiler work. Own what we publish and when.
Customer engagement. Present in technical reviews with anchor customers. Convert model gap lists into engineering-ready roadmap entries.
Quantization and numerics. Own the roadmap for INT4 (W4A8/W4A16), FP8, OCP MX, and KV cache compression. Coordinate with HW PM on MAC capability and with customer SW teams on model format decisions ahead of silicon tape-out.
Framework and runtime integrations. Define the integration strategy for GGML/llama.cpp, vLLM, ONNX Runtime, ExecuTorch, and HF Optimum — deep partnership vs. thin reference.
Model zoo. Maintain a set of customer-confidence models (LLM chat, BEVFormer, VLA, ADAS perception) that serve as forcing functions for compiler completeness.
Quarterly roadmap tours. Take the roadmap to anchor customers, prospects, and the field. Brief the PMM monthly on what shipped and how to position it.
Competitive intelligence. Track Synopsys MetaWare, Arm KleidiAI, Ceva NeuPro Studio, and NVIDIA TensorRT-LLM. Brief exec and sales quarterly.
Safety and quality. Coordinate with the safety lead on ISO 26262 traceability and qualification artifacts.