About The Position

We are looking for a highly skilled Performance Modeling Architect to lead the architectural definition and improvement of our next-generation CPU Cache Hierarchies and interconnects. This is an outstanding chance to create scalable solutions that connect two fast-paced domains: the high-reliability, low-latency needs of Automotive and the massive efficiency, high-density demands of Data Center systems. You will build the "source of truth" models that govern data movement across our silicon, ensuring our next-level caches (L3/System Cache) and coherent fabrics achieve ambitious performance goals.

Requirements

  • A Master’s or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture with 5+ years of experience.
  • Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
  • Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
  • Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
  • Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Nice To Haves

  • Practical experience managing the functional safety (ISO 26262) requirements of automotive chips alongside the power-performance-area (PPA) limitations of data center hardware.
  • Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
  • A background in using formal verification or mathematical modeling to prove the correctness of complex coherency state machines.
  • A history of building your own internal tools or frameworks to accelerate architectural exploration rather than just using off-the-shelf simulators.
  • Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they collaborate with coherent fabrics.

Responsibilities

  • Developing and maintaining high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
  • Modeling and analyzing performance bottlenecks across varying scales, from small-cluster automotive SoCs to massive, multi-mesh data center architectures.
  • Evaluating the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
  • Running and analyzing industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
  • Collaborating with build and verification teams to correlate performance models with silicon and working with software teams to optimize drivers for the underlying hardware topology.

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service