About The Position

Own end-to-end technical delivery of AI and GPU-based platforms from architecture through production deployment. Act as the system-level technical owner, driving decisions across architecture, silicon, firmware, hardware, OS, validation, and manufacturing. Define and manage technical baselines with TPMs, including scope, schedule, dependencies, and change control. Own system design quality and completeness, ensuring alignment with AI workloads, GPU performance targets, and platform constraints. Influence platform-, rack-, and datacenter-level architecture, including GPU clustering, interconnects, and power and liquid cooling solutions. Evaluate and de-risk NUDD technologies such as new accelerators, memory hierarchies, and high-speed interconnects. Drive system-level tradeoffs across power, thermal, mechanical, cost, reliability, and manufacturability to optimize performance and TCO. Partner with validation and manufacturing teams to resolve issues and ensure production-ready designs. Collaborate with internal teams and external partners, including silicon vendors and open-source communities, to integrate new technologies.

Requirements

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR equivalent experience. These requirements include but are not limited to the following specialized security screenings:
  • 10+ years of relevant experience in system (compute, storage, networking, and/or accelerator) level design and/or implementation across the hardware development lifecycle.
  • 10+ years of hands-on experience in server hardware architecture, design, and development with solid understanding of hardware, firmware, and Operating System (OS).
  • Proven experience delivering AI and GPU-based systems to production.
  • Proven track record leading cross-functional technical execution across hardware, firmware, software, and datacenter infrastructure.
  • Deep system expertise in power delivery, thermal and liquid cooling, signal integrity, mechanical design, and reliability.
  • Experience bringing high-volume silicon platforms (GPU, SoC, accelerator) from architecture through production ramp.
  • Hands-on experience with PCIe, DDR, Ethernet, BIOS/BMC, and Linux and Windows integration.
  • Experience with datacenter-scale AI systems, including system debug and root cause analysis.
  • Ability to evaluate AI systems using performance-per-watt and performance-per-dollar metrics.
  • Clear, concise communicator with the ability to influence technical direction across teams and at senior levels.

Responsibilities

  • Own end-to-end technical delivery of AI and GPU-based platforms from architecture through production deployment.
  • Act as the system-level technical owner, driving decisions across architecture, silicon, firmware, hardware, OS, validation, and manufacturing.
  • Define and manage technical baselines with TPMs, including scope, schedule, dependencies, and change control.
  • Own system design quality and completeness, ensuring alignment with AI workloads, GPU performance targets, and platform constraints.
  • Influence platform-, rack-, and datacenter-level architecture, including GPU clustering, interconnects, and power and liquid cooling solutions.
  • Evaluate and de-risk NUDD technologies such as new accelerators, memory hierarchies, and high-speed interconnects.
  • Drive system-level tradeoffs across power, thermal, mechanical, cost, reliability, and manufacturability to optimize performance and TCO.
  • Partner with validation and manufacturing teams to resolve issues and ensure production-ready designs.
  • Collaborate with internal teams and external partners, including silicon vendors and open-source communities, to integrate new technologies.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service