Head of Supercomputing

EtchedSan Jose, CA
2dOnsite

About The Position

Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Lead System Software development for Etched’s ground breaking Inference Acceleration Systems. As Head of Supercomputing Software, you will guide talented engineering and test teams responsible for the full low-level stack (firmware, drivers, OS, monitoring, test automation). Key responsibilities include attracting and developing world-class talent, defining technical strategy, driving quality execution from silicon bring-up through production, and collaborating across hardware, software, and manufacturing partners to deliver high-performance, reliable ML platforms.

Requirements

  • Proven experience (5+ years) managing software engineering teams, ideally in system software, embedded Linux, server firmware, or system-level validation
  • A strong track record of attracting, developing, mentoring, and retaining top engineering talent, fostering high-performing and diverse teams
  • Deep technical expertise in areas such as BIOS/UEFI, BMC firmware, Root of Trust/Secure Boot, Linux kernel, PCIe device drivers, OS-level services, and system monitoring
  • Solid understanding of server and hardware architecture, including CPU/SoC/ASIC design, memory hierarchies, and PCIe interconnects
  • Experience managing the complete software development lifecycle: requirements, design, implementation, testing, release, and maintenance
  • History of successfully delivering complex system software projects tightly coupled with hardware platforms
  • Excellent leadership, communication, and collaboration skills across functional boundaries
  • Strong problem-solving abilities, especially in debugging complex hardware/software system interactions
  • Familiarity with modern development workflows including Git, CI/CD pipelines, and software engineering best practices

Nice To Haves

  • System software development or validation for AI/ML accelerators or custom ASIC/SoC hardware platforms
  • Deploying and managing diagnostics and software tests in high-volume manufacturing environments (e.g., factory test, L10)
  • Working with OpenBMC, Redfish, or other modern BMC firmware stacks and related standards
  • Deep knowledge of system-level security: threat modeling, secure boot, and secure development lifecycle practices
  • Managing container technologies like Docker and Kubernetes at the node/system level
  • Familiarity with working alongside server contract manufacturers in APAC, including logistics and test support coordination

Responsibilities

  • Team Leadership & Talent Development: Lead, manage, and inspire high-caliber teams of system software developers and system test engineers.
  • Team Building: Build and scale world-class system software and test teams by attracting, hiring, and retaining, top-tier engineering talent.
  • Technical Strategy & Roadmap: Define and drive the technical strategy, architecture, and development roadmap for the entire system software stack (UEFI/BIOS, BMC, RoT, Drivers, OS, and Monitoring).
  • Coach and Mentor: Actively coach and mentor team members, fostering professional growth through challenging assignments, targeted development plans and continuous feedback. Cultivate team culture focused on technical excellence and results.
  • Execution & Delivery: Oversee end-to-end software development lifecycle – including design, implementation, testing, validation, and release.
  • System Validation Leadership: Provide direction and oversight for System Test engineering teams responsible for the validation and qualification of Etched ML System Software stack.
  • Manufacturing Test Integration: Collaborate closely with Manufacturing Operations, Test Engineering, and external partners (CMs) to deliver system software testing and diagnostics to manufacturing environments.
  • Cross-Functional Collaboration: Partner effectively with ASIC design, hardware platform engineering, and external manufacturing partners, to ensure seamless hardware/software integration and address system-level challenges.
  • Golden Image Management: Oversee the creation, maintenance, and release process for the validated golden reference container images.
  • Resource Management: Manage project priorities, deadlines, and resources effectively across development and test teams for multiple concurrent projects.

Benefits

  • Full medical, dental, and vision packages, with generous premium coverage
  • Housing subsidy of $2,000/month for those living within walking distance of the office
  • Daily lunch and dinner in our office
  • Relocation support for those moving to West San Jose

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Director

Education Level

No Education Listed

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service