GPU Architect/Designer

Syndesus•Remote,

About The Position

Our client is a well-funded, venture-backed semiconductor startup developing next-generation GPU technology. The company is in a growth stage with significant capital backing and is building a world-class engineering team to design high-performance, scalable GPU architectures from the ground up. This is a rare opportunity to join at a foundational stage and directly shape the direction of cutting-edge silicon.

Requirements

Bachelor's, Master's, or PhD in Computer Engineering, Electrical Engineering, or Computer Science.
10+ years of experience in GPU, CPU, or parallel processor architecture.
Strong experience with SIMT / SIMD architectures
Strong experience with Shader core design
Strong experience with Thread scheduling
Strong experience with Pipeline microarchitecture
Strong experience with Memory hierarchy design
Proficiency in SystemVerilog or Verilog
Proficiency in Microarchitecture specification development
Proficiency in Performance modeling tools
Proficiency in RTL-level debugging
Deep understanding of Parallel computing models
Deep understanding of GPU execution models
Deep understanding of Pipeline hazard handling
Deep understanding of Synchronization primitives

Responsibilities

Define and evolve GPU shader core architecture, including SIMT execution units and pipeline design.
Design warp/wavefront scheduling, thread dispatch, and execution models.
Architect SIMT execution pipelines, including ALU pipelines, vector units, and control flow units.
Define thread divergence handling, reconvergence strategies, and branch control mechanisms.
Develop scalable shader architectures supporting high thread-level parallelism.
Collaborate on ISA definitions related to shader and compute workloads.
Analyze shader workloads and identify performance bottlenecks.
Optimize GPU execution efficiency across diverse workloads including compute shaders, AI/ML kernels, and high-performance parallel workloads.
Drive performance-per-watt and area efficiency improvements.
Define GPU memory subsystem interactions including register files, shared/local memory, L1/L2 cache hierarchy, and memory coalescing mechanisms.
Optimize memory access scheduling and bandwidth utilization.
Collaborate on interconnect and memory fabric architecture.
Translate architectural specifications into microarchitecture definitions.
Implement shader pipeline logic in SystemVerilog.
Define architectural test plans and validation strategies.
Develop directed tests, constrained-random tests, and performance validation frameworks.
Analyze simulation and silicon results to drive design improvements.