Staff Hardware Diagnostics Engineer

Cerebras SystemsSunnyvale, CA
1d$150,000 - $260,000Onsite

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Role Overview We are seeking an experienced Hardware Diagnostic Engineer to design, develop, and maintain diagnostic software and test infrastructure for complex hardware systems. You will work closely with hardware design, firmware, and manufacturing teams to ensure product quality, reliability, and field serviceability across multiple board types and system configurations.

Requirements

  • 8–10 years of direct experience in one or more of the following areas:
  • Hardware diagnostic engineering — developing and maintaining diagnostic tools and frameworks for hardware validation, failure analysis, and root cause investigation.
  • Embedded systems development — building firmware and low-level software for embedded platforms, including board bring-up, driver development, and hardware-software integration.
  • Education: Bachelor’s degree in Electrical Engineering (BSEE), Computer Engineering (BSCE), or a related field. (Master’s degree preferred).
  • Proven Track Record: Experience shipping at least 2–3 major hardware products from concept through mass production.
  • Strong proficiency in C programming for embedded and systems-level software.
  • Experience with embedded Linux environments and cross-compilation toolchains (ARM, AArch64).
  • Deep understanding of hardware interfaces: I2C, SPI, UART, PCIe, GPIO, JTAG.
  • Proficiency with scripting languages (Bash, Python) for test automation and data analysis.
  • Ability to read and interpret hardware schematics and datasheets.
  • Experience with version control (Git) and collaborative development workflows.

Nice To Haves

  • Familiarity with power subsystems (voltage regulators, power sequencing, fault detection).
  • Strong knowledge of Makefiles, build systems, and dependency management for multi-target builds.
  • Solid understanding of CLI design, parameter parsing, and interactive console interfaces.
  • Ability to synthesize data from telemetry and logs to identify hardware-firmware marginalities.
  • Communication: Exceptional ability to explain complex hardware failures to non-technical stakeholders.

Responsibilities

  • Diagnostic Software Development: Architect, implement, and maintain board-level and system-level diagnostic tests in C/C++ for embedded platforms (AArch64, x86_64).
  • Root Cause Analysis (RCA): Lead deep-dive investigations into intermittent or systemic hardware failures at board or system levels and help improve diagnostic software to catch these failures early.
  • Test Coverage & Strategy: Define diagnostic test plans and coverage strategies for new and existing hardware — including power systems, I/O modules, fans, sensors, and management controllers.
  • CLI & User Interface: Design and maintain command-line interfaces for interactive and scripted diagnostic execution, including remote diagnostics.
  • Failure Analysis: Analyze diagnostic test results, triage hardware failures, and provide root-cause analysis for manufacturing and field issues.
  • Board Bring-Up Support: Collaborate with hardware engineers during board bring-up to develop and execute early validation and debug diagnostics.
  • Cross-Functional Leadership: Act as the technical bridge between hardware engineering and manufacturing, ensuring diagnostic tools are optimized for factory-line throughput and accuracy.
  • Mentorship: Provide technical guidance and mentorship to junior diagnostic and validation engineers.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service