Head of Supercomputing

EtchedSan Jose, CA
10hOnsite

About The Position

Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history. Lead System Software development for Etched’s ground breaking Inference Acceleration Systems. As Senior Manager, System Software, you will guide talented engineering and test teams responsible for the full low-level stack (firmware, drivers, OS, monitoring, test automation). Key responsibilities include attracting and developing world-class talent, defining technical strategy, driving quality execution from silicon bring-up through production, and collaborating across hardware, software, and manufacturing partners to deliver high-performance, reliable ML platforms.

Requirements

  • Proven experience (5+ years) managing software engineering teams, ideally in system software, embedded Linux, server firmware, or system-level validation
  • A strong track record of attracting, developing, mentoring, and retaining top engineering talent, fostering high-performing and diverse teams
  • Deep technical expertise in areas such as BIOS/UEFI, BMC firmware, Root of Trust/Secure Boot, Linux kernel, PCIe device drivers, OS-level services, and system monitoring
  • Solid understanding of server and hardware architecture, including CPU/SoC/ASIC design, memory hierarchies, and PCIe interconnects
  • Experience managing the complete software development lifecycle: requirements, design, implementation, testing, release, and maintenance
  • History of successfully delivering complex system software projects tightly coupled with hardware platforms
  • Excellent leadership, communication, and collaboration skills across functional boundaries
  • Strong problem-solving abilities, especially in debugging complex hardware/software system interactions
  • Familiarity with modern development workflows including Git, CI/CD pipelines, and software engineering best practices

Nice To Haves

  • System software development or validation for AI/ML accelerators or custom ASIC/SoC hardware platforms
  • Deploying and managing diagnostics and software tests in high-volume manufacturing environments (e.g., factory test, L10)
  • Working with OpenBMC, Redfish, or other modern BMC firmware stacks and related standards
  • Deep knowledge of system-level security: threat modeling, secure boot, and secure development lifecycle practices
  • Managing container technologies like Docker and Kubernetes at the node/system level
  • Familiarity with working alongside server contract manufacturers in APAC, including logistics and test support coordination
  • Current Development or Validation Managers/Directors (or those managing combined teams) from semiconductor, server hardware, cloud infrastructure, or HPC companies.
  • Senior technical leaders or architects in system software or validation with demonstrated leadership experience, strong mentorship skills, and readiness for management.
  • Managers who have led teams responsible for delivering and qualifying foundational software for complex hardware systems, including enabling manufacturing test.
  • Individuals with a strong track record of building and managing teams focused on firmware, kernel, OS development, and system test, prioritizing technical excellence, quality, and talent development.

Responsibilities

  • Team Leadership & Talent Development: Lead, manage, and inspire high-caliber teams of system software developers and system test engineers.
  • Team Building: Build and scale world-class system software and test teams by attracting, hiring, and retaining, top-tier engineering talent.
  • Technical Strategy & Roadmap: Define and drive the technical strategy, architecture, and development roadmap for the entire system software stack (UEFI/BIOS, BMC, RoT, Drivers, OS, and Monitoring).
  • Coach and Mentor: Actively coach and mentor team members, fostering professional growth through challenging assignments, targeted development plans and continuous feedback. Cultivate team culture focused on technical excellence and results.
  • Execution & Delivery: Oversee end-to-end software development lifecycle – including design, implementation, testing, validation, and release.
  • System Validation Leadership: Provide direction and oversight for System Test engineering teams responsible for the validation and qualification of Etched ML System Software stack.
  • Manufacturing Test Integration: Collaborate closely with Manufacturing Operations, Test Engineering, and external partners (CMs) to deliver system software testing and diagnostics to manufacturing environments.
  • Cross-Functional Collaboration: Partner effectively with ASIC design, hardware platform engineering, and external manufacturing partners, to ensure seamless hardware/software integration and address system-level challenges.
  • Golden Image Management: Oversee the creation, maintenance, and release process for the validated golden reference container images.
  • Resource Management: Manage project priorities, deadlines, and resources effectively across development and test teams for multiple concurrent projects.

Benefits

  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service