Head of Supercomputing

EtchedSan Jose, CA
Onsite

About The Position

Etched is building at-scale AI inference supercomputers powered by our custom ASICs, and the Supercomputing organization is responsible for making them real, deployable, and reliable. We are seeking a Head of Supercomputing to define and lead the architecture, software stack, and operational model for Etched’s cluster-scale AI compute systems. This leader will own the end-to-end system software and control-plane strategy — spanning orchestration, telemetry, provisioning, networking, and fleet reliability — from first silicon through production deployment. This role combines deep systems expertise with strong organizational leadership. You will build and lead a world-class team, partnering closely with ASIC, hardware, kernel, runtime, and infrastructure teams to deliver the highest-performance AI inference systems in the world.

Requirements

  • 15+ years of experience in system software, infrastructure, or large-scale compute systems, including 5+ years leading engineering teams
  • Strong understanding of hardware/software interfaces such as PCIe, RDMA, memory hierarchies, interrupts, and device drivers
  • Experience building or operating cluster-scale systems (HPC, AI infrastructure, hyperscale compute, or custom accelerators)
  • Proven track record of delivering complex systems from early bring-up through production
  • Strong debugging skills across hardware–software interactions
  • Excellent leadership, communication, and cross-functional collaboration skills

Responsibilities

  • Define and drive the technical vision and roadmap for Etched’s Supercomputing software stack, from node bring-up to multi-rack clusters
  • Build, scale, and lead a high-performance Supercomputing organization
  • Directly manage and develop 15+ engineers with a variety of experience levels
  • Architect and own low-level control-plane software for system bring-up, provisioning, networking, configuration, and fleet management
  • Define orchestration primitives for managing devices, nodes, racks, and full cluster deployments
  • Oversee development of system services that interface directly with firmware, drivers, kernel subsystems, and runtime layers
  • Establish system telemetry and observability infrastructure for customers
  • Own the system software lifecycle from first silicon bring-up through stable production releases
  • Collaborate with manufacturing and test engineering to integrate diagnostics and system software into factory environments
  • Define reliability targets, operational metrics, and release processes for production deployments
  • Recruit, mentor, and retain exceptional systems engineers and engineering leaders
  • Act as a senior technical voice shaping company-wide infrastructure decisions

Benefits

  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service