Performance Modeling Engineer

DensityAI•Mountain View, CA

11d•$180,000 - $250,000•Onsite

About The Position

This role involves owning the pre-silicon performance modeling and analysis that sets the architectural targets for our AI accelerator silicon. You'll characterize target ML workloads, build the analytical and roofline models that project performance onto proposed hardware, and turn that analysis into the PPA trade-off guidance the architecture, RTL, and compiler teams design against —well before first silicon. The role involves access to ITAR-controlled information, requiring applicants to be U.S. persons.

Requirements

Strong computer-architecture fundamentals — memory hierarchy, compute/bandwidth roofline, dataflow, on-chip interconnect/NoC, and accelerator/GPU/TPU-class datapaths
Demonstrated performance modeling or analysis experience: analytical or simulation-based projection of real workloads onto hardware, where your results drove actual design decisions
Deep understanding of how ML workloads map to hardware GEMM/conv/attention, quantization, parallelism (data/tensor/pipeline), and collective communication
Fluency in Python (C++ a plus) for building models, analysis pipelines, and trace/data analysis at scale
5+ years in performance architecture, modeling, or analysis for CPUs, GPUs, accelerators, or complex SoCs
U.S. persons (U.S. citizens, U.S. permanent residents, asylees, or refugees) per 22 CFR 120.62

Responsibilities

Own pre-silicon performance modeling and analysis — workload characterization, roofline / analytical models, and what-if trade-off studies that guide microarchitecture decisions before RTL is committed
Translate target ML workloads (transformer training/inference, attention, GEMM/conv, collectives) into performance projections across compute, memory-bandwidth, and interconnect bottlenecks
Drive PPA (performance / power / area) trade-off analysis with architecture, RTL, and software/compiler teams — recommend where to spend area, bandwidth, and power for the most performance
Define and own the performance KPIs and the methodology for tracking them from architecture through silicon
Correlate model projections against RTL, emulation, and post-silicon data as it arrives, and feed the deltas back into the model to keep it predictive