ASIC Architect

Cerebras SystemsSunnyvale, CA

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Requirements

  • Masters/PhD in Electrical/Computer Engineering
  • 10+ years of experience across performance analysis and modeling across GPUs, CPUs or accelerator products
  • Strong background in computer architecture and key high level architectural trade-offs
  • Comfortable standing up new performance models from scratch in Python or similar analytical environments
  • Exposure to micro-code (kernel) performance bottlenecks and optimization techniques
  • Good understanding of how high-level workloads map to underlying micro-architecture is desired

Nice To Haves

  • Understanding of basic ML workload profiling techniques and model network architecture is preferred

Responsibilities

  • Translate high level architecture spec to micro-architecture feature requirements
  • Bring up new features in the performance/power model
  • Perform comprehensive PPA trade-offs for new architectural features
  • Extract insights for new features and micro-architecture power efficiency
  • Profile workloads, identify bottlenecks and project competition performance for benchmarking
  • Engage with SW teams for end-end application level modeling at cluster level
  • Identify kernel level HW acceleration level opportunities

Benefits

  • Build a breakthrough AI platform beyond the constraints of the GPU.
  • Publish and open source their cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Our simple, non-corporate work culture that respects individual beliefs.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service