Lead RTL Design Engineer

Efficient Computer•San Jose, CA

12h

About The Position

Efficient is developing the world’s most energy-efficient general-purpose computer processor. Efficient’s patented technology uses 100x less energy than state of the art commercially available ultra-low-power processors and is programmable using standard high-level programming languages and AI/ML frameworks. This level of efficiency makes perpetual, pervasive intelligence possible: run AI/ML continuously on a AA battery for 5-10 years. Our platform’s unprecedented level of efficiency enables IoT devices to intelligently capture and curate first-party data to drive the next major computing revolution. We are looking for a Lead RTL Design Engineer to own microarchitecture definition and RTL implementation across the dataflow execution fabric, memory subsystem, on-chip interconnect/NoC, low-power logic, and standard peripheral IP (RiscV, NVM, I2S, I2C) integration. You will work from architecture spec through synthesis-ready RTL, collaborating with architects, microarchitects, DV leads, physical design, and firmware teams to tape out an industry leading power-efficient SoC. This is a unique opportunity to be a part of a newly formed HW engineering org and have an influence on our products and processes as we move from the initial stages of product development to market release and scaled volume production. Join our team and help us shape the future of computing at the edge and beyond!

Requirements

8+ years of RTL design experience with tape-out ownership of dataflow based design, on chip networks, memory subsystems or peripheral integration on a processor or accelerator SoC.
Deep proficiency in SystemVerilog for RTL — synthesis-clean, lint-clean, timing-aware; able to design complex state machines, arbiters, token flow controllers, and datapath logic from scratch.
Solid understanding of parallel execution models: dataflow, SIMD, or systolic array architectures; familiarity with the hardware challenges of token-based firing-rule evaluation and producer-consumer synchronization.
Hands-on experience with on-chip memory design: SRAM wrappers, scratchpad/TCM, banking, and memory-mapped register interfaces.
Experience with low-power RTL techniques: UPF-driven flows, clock gating, power domains, retention registers, and AON wakeup logic.
Familiarity with at least one standard on-chip bus protocol (AXI, AHB, APB, TileLink, or NoC equivalent) at the RTL implementation level.
Experience taking RTL through synthesis and timing closure; ability to read and act on SDC constraints, STA reports, and synthesis QoR summaries.
Strong written communication skills; able to produce uArch specs and design review material independently.
Experience with memory compiler toolchains

Nice To Haves

Prior RTL ownership of a dataflow engine, neural processing unit (NPU), or streaming DSP architecture with explicit producer-consumer token management.
Experience collaborating with compiler or graph-optimization teams to co-design hardware execution models and graph IR representations.
Familiarity with NVM controller RTL (MRAM, RRAM) including ECC, program/erase sequencing, and model weight storage use cases.
Experience with IoT-class power budgets (sub-10 mW active, sub-100 µW standby) and the RTL design choices they necessitate.
Familiarity with functional safety standards (ISO 26262, IEC 61508) as applied to execution fabric error detection and power domain isolation.
Exposure to AI framework graph formats (ONNX, TFLite) and understanding of how graph compilation maps to hardware execution primitives.
Tape-out credits on an edge-AI, IoT, or wearable SoC at 12nm or below.
Experience with formal verification of flow-control logic, deadlock freedom, or bus protocol compliance.

Responsibilities

Own the design and definition of processor and compute-unit microarchitecture, including dataflow pipelines, execution units, and interfaces. Set performance, power, and area targets, and guide the team toward achieving them.
Define and drive the design of on-chip networks and data movement across the fabric, balancing performance, scalability, and implementation constraints in collaboration with physical design.
Define the interface to the memory subsystem, including data movement, ordering, and synchronization behavior, ensuring a clean and scalable model for software and future system expansion.
Lead the architecture of configuration, scheduling, and execution of workloads on the fabric, including multi-kernel support and interaction with host systems.
Drive power architecture across the design, including clocking, reset, power domains, and low-power strategies to meet aggressive energy and efficiency goals.
Collaborate closely with compiler and software teams to define the hardware execution model, ensuring efficient mapping of workloads onto the architecture.
Author and own uArch specification documents for assigned blocks; drive design reviews with architecture, compiler, DV, and physical design stakeholders.
Mentor senior and junior RTL engineers; review RTL, flag microarchitecture risks, and enforce coding style and lint-clean standards across the team.
Participate in PPA analysis loops: synthesize blocks regularly, review area/timing/power reports, and make data-driven tradeoffs against performance and feature requirements.
Collaborate with DV leads to define/review verification plans; provide directed test scenarios for graph execution corner cases, back-pressure conditions, and power state transitions.
Support silicon bring-up: contribute scan/ATPG guidelines, review DFT insertion, and provide RTL-level debug assistance during lab validation.