Senior Performance Modeling Architect, CPU Fabric and LLC

NVIDIA•Santa Clara, CA

About The Position

We are looking for a highly skilled Performance Modeling Architect to lead the architectural definition and improvement of our next-generation CPU Cache Hierarchies and interconnects. This is an outstanding chance to create scalable solutions that connect two fast-paced domains: the high-reliability, low-latency needs of Automotive and the massive efficiency, high-density demands of Data Center systems. You will build the "source of truth" models that govern data movement across our silicon, ensuring our next-level caches (L3/System Cache) and coherent fabrics achieve ambitious performance goals.

Requirements

A Master’s or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture with 5+ years of experience.
Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Nice To Haves

Practical experience managing the functional safety (ISO 26262) requirements of automotive chips alongside the power-performance-area (PPA) limitations of data center hardware.
Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
A background in using formal verification or mathematical modeling to prove the correctness of complex coherency state machines.
A history of building your own internal tools or frameworks to accelerate architectural exploration rather than just using off-the-shelf simulators.
Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they collaborate with coherent fabrics.

Responsibilities

Developing and maintaining high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
Modeling and analyzing performance bottlenecks across varying scales, from small-cluster automotive SoCs to massive, multi-mesh data center architectures.
Evaluating the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
Running and analyzing industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
Collaborating with build and verification teams to correlate performance models with silicon and working with software teams to optimize drivers for the underlying hardware topology.