Principal Engineer, NPU Architect

Renesas Electronics•Austin, TX

12d•Hybrid

About The Position

We are looking for a Principal NPU Hardware Architect with 10 to 15 years of experience to drive the architectural definition and hardware implementation of high-performance Neural Processing Units (NPUs) targeted for microcontrollers and microprocessors addressing Automotive high performance compute. This is a hardware-oriented role that requires a deep understanding of the full silicon lifecycle, combined with a strong background in hardware-software co-design to ensure the NPU architecture is highly optimized for compiler-driven execution and software stacks.

Requirements

12+ years in AI accelerator, NPU, or GPU hardware architecture and RTL design.
Deep knowledge of deep learning primitives (CNNs, Transformers, RNNs) and how they map to spatial compute hardware.
Strong understanding of compiler backends (e.g., LLVM, MLIR), IR transformations, and how hardware features like scratchpad memories or tiling impact compiler efficiency.
Proven track record with modern SoC protocols (AXI/ACE/CHI) and integrating NPU cores into larger system-on-chip environments.
Expert-level proficiency in SystemC/TLM or C++ for architectural performance modeling and hardware-software co-verification.
Ability to act as a technical authority, mentoring junior designers and influencing cross-functional roadmaps.

Nice To Haves

Bachelor’s or Master’s in Electrical Engineering or Computer Engineering (PhD desirable)

Responsibilities

NPU Architecture & Dataflow: Define and own the end-to-end NPU micro-architecture, including high-throughput tensor/matrix engines, vector units, and specialized activation functional units.
Hardware-Software Co-Design: Partner closely with compiler and software teams to define instruction sets (ISA), memory management schemes, and hardware-aware graph optimizations.
Virtualization & Multi-Tenancy: Architect hardware-assisted virtualization features to enable secure resource sharing and multi-tenant execution in cloud or edge environments.
Interconnect & Fabric: Design and integrate high-bandwidth Bus fabrics (e.g., NoC, CHI) and DMA controllers optimized for the massive data movement inherent in AI workloads.
Infrastructure & Power: Lead the definition of SoC infrastructure elements, including complex clock/reset domains and advanced power management strategies to maximize performance-per-watt.
Performance Modeling: Develop bit-accurate and cycle-accurate C++/SystemC models to validate architectural choices and enable early software development.
Full Design Flow: Oversee the transition from architectural spec to RTL, providing technical leadership through verification, physical design, and post-silicon bring-up.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume