Senior QA Engineer – Performance & Reliability

AxiadoSan Jose, CA
$100,000 - $160,000

About The Position

Axiado is a top-tier manufacturer of TCU (Trusted Control/Compute Unit) solutions. We are hiring a Senior QA Engineer – Performance & Reliability to lead the performance characterization and reliability validation of our Secure TCU System, ensuring it meets rigorous data center standards. In this role, you will own the test design, execution, and deep-dive analysis for performance and reliability, working closely with development teams to identify bottlenecks and resolve complex system-level issues.

Requirements

  • 5+ years of experience in embedded system testing, with a strong focus on performance verification and reliability engineering.
  • Deep understanding of TCU, BMC, HMC, RoT (Root of Trust), Secure Boot, TPM, HSM, PCIe (Gen4/5), DDR memory, and networking protocols.
  • Proficiency with performance profiling tools, traffic generators, and standard benchmarks (e.g., SPEC, IOzone, iperf). Experience with thermal and power measurement tools.
  • Strong scripting skills in Python for test automation and data analysis; familiarity with C/C++ for code analysis is a plus.
  • Strong Linux/Unix skills, including kernel tuning, system monitoring, and log analysis.
  • Experience with CI/CD pipelines (Jenkins, GitLab CI) and version control (Git).
  • BS/MS degree in Computer Science, Electrical Engineering, or a related field.

Nice To Haves

  • Experience with AI-driven log analysis or anomaly detection tools to predict reliability issues.
  • Background in validation of high-speed interfaces (PCIe, CXL) and memory subsystems (DDR5/LPDDR5).
  • Experience with data center server architecture and thermal management.
  • Knowledge of industry reliability standards (e.g., Telcordia, JEDEC).

Responsibilities

  • Design and execute comprehensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.
  • Analyze system behavior under various heavy workloads to identify performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).
  • Conduct Mean Time Between Failures (MTBF) prediction, long-duration stability tests, and error injection campaigns to validate system robustness.
  • Lead the deep-dive investigation of performance degradation and reliability failures. Use advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.
  • Work directly with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and verify fixes.
  • Develop and maintain automated performance testing frameworks and reporting dashboards to track regression and trends over time.
  • Generate detailed performance assessment reports and reliability analysis metrics for stakeholders.
  • Mentor junior engineers on performance testing methodologies and system debugging techniques.

Benefits

  • Access to the world's leading research, technology and talent.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service