Lead Vector Compute Architect

Bolt GraphicsSunnyvale, CA
$180,000 - $220,000Onsite

About The Position

We are looking for an experienced and highly motivated Lead Vector Compute Architect to lead the architecture definition and technical direction for Bolt’s next-generation GPUs. The ideal candidate will have strong expertise in data parallel compute unit architecture development, performance modeling, data path integration, and cross-functional collaboration across hardware, software, and systems teams. This role involves defining scalable and high-performance architectures for advanced compute workloads including graphics, HPC, and system management. This role is on-site and requires someone to be local to the Bay Area.

Requirements

  • Strong understanding of modern data parallel microarchitectures and subsystem integration.
  • 6+ years of experience in modern data parallel microarchitecture including: Workload characterization and profiling, Performance modeling, Out-of-order data dependency and control, Utilization / occupancy optimization, High-performance architecture design techniques
  • Experience with one or more of the following: CPU/GPU/NPU architectures, NoC/interconnect architectures, Cache coherency protocols (CHI/ACE/CXL), High-speed interfaces (PCIe, UCIe, Ethernet), Memory systems (DDR, LPDDR, HBM, GDDR), Power, performance, and area optimization
  • Strong knowledge of RTL development and verification methodologies.
  • Experience with architecture modeling and performance analysis tools.
  • Familiarity with firmware/software interaction in complex SoC systems.
  • Excellent problem-solving, communication, and leadership skills.

Responsibilities

  • Define data parallel microarchitecture satisfying ISA constraints.
  • Drive architecture tradeoff analysis for performance, power, area, bandwidth, latency, and scalability.
  • Develop and review system architecture specifications, interface definitions, and microarchitecture requirements.
  • Collaborate with RTL, verification, physical design, firmware, software, and system teams throughout the development cycle.
  • Lead performance modeling, workload analysis, and bottleneck identification using C/C++/SystemC or similar modeling environments.
  • Define memory hierarchy, coherency architecture, and cache structures.
  • Work closely with verification teams to define architectural test plans and validation strategies.
  • Support silicon bring-up, debug, performance tuning, and post-silicon optimization.
  • Contribute to long-term technology and product roadmap planning.

Benefits

  • Medical, Dental, & Vision - 100% covered premiums
  • Equity - Stock Options
  • 401(k) match
  • WFH Hardware
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service