Principal Architect

Data Direct Networks
1dRemote

About The Position

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. "DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. We are seeking an exceptional Principal Software Engineer to join our team and lead the design, development, and evolution of next-generation storage infrastructure. This is a high-impact role where you will architect scalable, highly available storage solutions that power large-scale data-intensive workloads. You will collaborate closely with cross-functional teams, mentor engineers, and drive technical strategy for our distributed storage platforms.

Requirements

  • 15+ years of professional software engineering experience, with at least 8 years in a senior or principal role focused on backend or infrastructure systems.
  • Proven expertise in system design, including the ability to create scalable, maintainable architectures for complex distributed systems.
  • Deep understanding of distributed systems principles (consistency models, CAP theorem, consensus protocols, partitioning, replication, etc.).
  • Strong experience building or operating high-availability systems in production environments.
  • Hands-on experience with storage technologies: Parallel filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS) and/or Object storage systems (e.g., Ceph, S3-compatible APIs, MinIO, OpenStack Swift).
  • Proficiency in one or more management/orchestration frameworks (e.g., Kubernetes, Slurm, Mesos, or similar resource management systems).
  • Strong programming skills in Rust, C++, Go, or Java (Rust strongly preferred).
  • Excellent communication skills with a track record of influencing technical direction across teams.
  • Experience leading large-scale projects from conception through deployment and operations.

Nice To Haves

  • Experience in High Performance Computing (HPC) environments, including workload schedulers, burst buffers, or scientific computing storage workflows.
  • Contributions to open-source storage projects (e.g., Ceph, Lustre).
  • Familiarity with cloud-native storage solutions and multi-cloud architectures.
  • Background in performance tuning and benchmarking of storage systems at scale.
  • Experience with data durability, erasure coding, or tiered storage strategies.

Responsibilities

  • Lead the end-to-end design and implementation of large-scale storage systems, including architecture reviews, system design documents, and technical roadmaps.
  • Design and optimize distributed systems with a focus on high availability, fault tolerance, scalability, and performance.
  • Drive innovation in storage technologies, including parallel filesystems and/or object storage systems.
  • Collaborate with product managers, infrastructure teams, and other engineers to define requirements and deliver robust, production-ready solutions.
  • Mentor senior and junior engineers, conduct code and design reviews, and foster best practices in software engineering.
  • Troubleshoot complex production issues in distributed environments and implement long-term preventive solutions.
  • Contribute to open-source projects or internal tools that advance the state of storage and distributed systems.
  • Stay current with industry trends and evaluate emerging technologies for adoption.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service