Lead Vector Compute Architect

Bolt Graphics•Sunnyvale, CA

7d•$180,000 - $220,000•Onsite

About The Position

We are looking for an experienced and highly motivated Lead Vector Compute Architect to lead the architecture definition and technical direction for Bolt’s next-generation GPUs. The ideal candidate will have strong expertise in data parallel compute unit architecture development, performance modeling, data path integration, and cross-functional collaboration across hardware, software, and systems teams. This role involves defining scalable and high-performance architectures for advanced compute workloads including graphics, HPC, and system management. This role is on-site and requires someone to be local to the Bay Area.

Requirements

Strong understanding of modern data parallel microarchitectures and subsystem integration.
6+ years of experience in modern data parallel microarchitecture including: Workload characterization and profiling, Performance modeling, Out-of-order data dependency and control, Utilization / occupancy optimization, High-performance architecture design techniques
Experience with one or more of the following: CPU/GPU/NPU architectures, NoC/interconnect architectures, Cache coherency protocols (CHI/ACE/CXL), High-speed interfaces (PCIe, UCIe, Ethernet), Memory systems (DDR, LPDDR, HBM, GDDR), Power, performance, and area optimization
Strong knowledge of RTL development and verification methodologies.
Experience with architecture modeling and performance analysis tools.
Familiarity with firmware/software interaction in complex SoC systems.
Excellent problem-solving, communication, and leadership skills.

Responsibilities

Define data parallel microarchitecture satisfying ISA constraints.
Drive architecture tradeoff analysis for performance, power, area, bandwidth, latency, and scalability.
Develop and review system architecture specifications, interface definitions, and microarchitecture requirements.
Collaborate with RTL, verification, physical design, firmware, software, and system teams throughout the development cycle.
Lead performance modeling, workload analysis, and bottleneck identification using C/C++/SystemC or similar modeling environments.
Define memory hierarchy, coherency architecture, and cache structures.
Work closely with verification teams to define architectural test plans and validation strategies.
Support silicon bring-up, debug, performance tuning, and post-silicon optimization.
Contribute to long-term technology and product roadmap planning.