GPU Architect/Designer

SyndesusRemote,

About The Position

Our client is a well-funded, venture-backed semiconductor startup developing next-generation GPU technology. The company is in a growth stage with significant capital backing and is building a world-class engineering team to design high-performance, scalable GPU architectures from the ground up. This is a rare opportunity to join at a foundational stage and directly shape the direction of cutting-edge silicon.

Requirements

  • Bachelor's, Master's, or PhD in Computer Engineering, Electrical Engineering, or Computer Science.
  • 10+ years of experience in GPU, CPU, or parallel processor architecture.
  • Strong experience with SIMT / SIMD architectures
  • Strong experience with Shader core design
  • Strong experience with Thread scheduling
  • Strong experience with Pipeline microarchitecture
  • Strong experience with Memory hierarchy design
  • Proficiency in SystemVerilog or Verilog
  • Proficiency in Microarchitecture specification development
  • Proficiency in Performance modeling tools
  • Proficiency in RTL-level debugging
  • Deep understanding of Parallel computing models
  • Deep understanding of GPU execution models
  • Deep understanding of Pipeline hazard handling
  • Deep understanding of Synchronization primitives

Responsibilities

  • Define and evolve GPU shader core architecture, including SIMT execution units and pipeline design.
  • Design warp/wavefront scheduling, thread dispatch, and execution models.
  • Architect SIMT execution pipelines, including ALU pipelines, vector units, and control flow units.
  • Define thread divergence handling, reconvergence strategies, and branch control mechanisms.
  • Develop scalable shader architectures supporting high thread-level parallelism.
  • Collaborate on ISA definitions related to shader and compute workloads.
  • Analyze shader workloads and identify performance bottlenecks.
  • Optimize GPU execution efficiency across diverse workloads including compute shaders, AI/ML kernels, and high-performance parallel workloads.
  • Drive performance-per-watt and area efficiency improvements.
  • Define GPU memory subsystem interactions including register files, shared/local memory, L1/L2 cache hierarchy, and memory coalescing mechanisms.
  • Optimize memory access scheduling and bandwidth utilization.
  • Collaborate on interconnect and memory fabric architecture.
  • Translate architectural specifications into microarchitecture definitions.
  • Implement shader pipeline logic in SystemVerilog.
  • Define architectural test plans and validation strategies.
  • Develop directed tests, constrained-random tests, and performance validation frameworks.
  • Analyze simulation and silicon results to drive design improvements.

Benefits

  • Meaningful Equity
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service