SSD Drive Qualification - Reliability Validation Engineer

EverpureSanta Clara, CA
$217,000 - $326,000Onsite

About The Position

We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest team in the industry. This type of work—work that changes the world—is what the tech industry was founded on. So, if you're ready to seize the endless opportunities and leave your mark, come join us. As an SSD Reliability Validation Engineer, you will design and execute reliability and stress test plans for enterprise SSDs, with a primary focus on: RDT campaigns to demonstrate lifetime reliability and field-equivalent stress. 4-corner validation across temp / voltage / workload / media stress. Customer-mode validation, including customer-specific feature modes, power / perf limits, and telemetry/OCP profiles. You will work closely with NAND, firmware, hardware, systems, and manufacturing teams to define coverage, execute tests in automated environments, analyze results, and communicate clear recommendations on SSD readiness and risk.

Requirements

  • 5+ years of experience in SSD, storage, or hardware reliability / validation, ideally with enterprise or hyperscale products.
  • Strong understanding of NAND flash fundamentals (P/E cycling, wear-out, read-disturb, retention, ECC, PLP/holdup) and how they map into reliability tests.
  • Hands-on experience with RDT / ALT / ORT or equivalent reliability demonstration programs for SSDs or similar embedded products.
  • Proficient in reliability statistics and acceleration modeling for lifetime projections, with practical experience leveraging data science libraries and stat.
  • Proficient in JEDEC specifications, OCP datacenter storage profiles, and NVMe architectural requirements. Expert-level knowledge of industry-standard qualification and compliance frameworks,
  • Practical experience designing and executing 4-corner or environmental stress tests (voltage, temperature, workload corners).
  • Familiarity with NVMe / PCIe concepts (basic command flows, admin vs I/O path, error reporting, SMART/Telemetry, OCP extensions).
  • Strong Python (or similar) skills for test and tooling development; comfortable working in Linux-based lab environments.
  • Experience using test automation and CI (e.g., Jenkins, lab frameworks) to run large-scale, long-running test campaigns.
  • Solid data analysis skills to interpret error logs, telemetry, and large test datasets; able to turn data into clear engineering conclusions.

Responsibilities

  • Own end-to-end RDT and reliability demonstration plans for new SSDs, including workloads, sample plans, stress levels, and pass/fail criteria.
  • Plan and execute 4-corner and stress validation across voltage, temperature, workload, and media/background operations.
  • Translate customer requirements and modes into concrete reliability and stress tests, and provide clear readiness/risk readouts.
  • Develop and automate test content, harnesses, and CI/regression integration in Python/Linux-based environments.
  • Analyze logs and telemetry to debug issues, drive JIRA closure, and partner with NAND/FW/HW/systems/analytics teams on fixes and dashboards.
  • Utilize reliability statistics and acceleration models (e.g., Weibull) to architect sample plans, determine MTBF metrics, and define data-centric qualification thresholds.
  • Ensure RDT methodologies and workload profiles mirror industry-standard (e.g., JEDEC) benchmarks and real-world customer deployment scenarios.

Benefits

  • flexible time off
  • wellness resources
  • company-sponsored team events
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service