Principal Architect - Quality Engineering

Data Direct Networks
1dRemote

About The Position

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. "DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. We are seeking a highly experienced and technically versatile Quality Engineering Architect to lead the end-to-end quality strategy for Infinia — DDN’s highly distributed data intelligence platform. This role demands deep hands-on expertise in test automation frameworks, system-level validation, CI/CD integration, and large-scale platform reliability testing. You will be responsible for shaping the vision and execution of quality practices across Infinia’s distributed architecture — from low-level I/O and memory handling to high-throughput applications, multi-tenant services, and NVMe-backed storage systems. This is a pivotal technical leadership role for someone who can architect at scale, automate with precision, and inspire quality-first thinking across engineering teams.

Requirements

  • 15+ years of experience in software quality engineering, with extensive expertise in test architecture for distributed or cloud-native platforms.
  • Deep experience with automation development using Python, Bash, and Git-based CI/CD environments.
  • Proven success in architecting test frameworks across complex infrastructure components or platform services.
  • Strong understanding of file systems, I/O stack behavior, storage profiling (NVMe, SPDK), and network observability.
  • ISTQB Certified (or equivalent) with demonstrated leadership in quality process transformation and automation maturity.

Nice To Haves

  • Hands-on experience validating large-scale data platforms, file systems, or scheduling engines.
  • Familiarity with observability stacks (e.g., OpenTelemetry, Grafana, Prometheus) and system profiling tools.
  • Experience with compliance testing (e.g., Section 508, HIPAA, PCI), and security feature validation (SAML, access control, backup/restore).
  • Contributions to open-source testing frameworks or community-recognized QA initiatives.

Responsibilities

  • Define the Infinia-wide quality engineering architecture, driving consistency, testability, and observability across all subsystems (e.g., I/O Path, SPDK Data, Memory, Task Scheduling, Platform Services).
  • Architect and evolve end-to-end test frameworks for performance, correctness, data integrity, and security compliance.
  • Lead the design and implementation of modular and scalable automation frameworks using tools like Robot Framework, Selenium, Pytest, Postman, JMeter, and containerized test environments with Docker, Jenkins, and OpenShift.
  • Design reusable automation templates and CI pipelines to support rapid iteration and deployment with full validation gates.
  • Build and maintain comprehensive test plans that validate performance, workload behavior, concurrency, and cross-platform reliability at scale.
  • Leverage profiling, fuzzing, and failure injection methodologies to ensure system stability under load, including NVMe performance tracing, network saturation, and long-duration testing scenarios.
  • Partner with architects, developers, DevOps, and customer engineering to integrate quality into early-stage system and feature design.
  • Lead risk-based test planning, root cause analysis for systemic issues, and verification of security and compliance requirements.
  • Mentor engineers on test design, automation best practices, and quality standards across platform teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service