Sr. HPC Systems Architect (Storage)

KLAAnn Arbor, MI
Onsite

About The Position

In this senior role, you will own the architecture, deployment, and long‑term scalability of enterprise HPC storage and compute platforms. You’ll lead systems from early design through production, partnering across engineering, manufacturing, and vendors to deliver high‑performance, highly available HPC infrastructure at scale. This role is ideal for someone who enjoys deep technical ownership, architectural influence, and solving complex infrastructure challenges in real production environments. You’ll influence architectural decisions, build storage systems that truly scale, and work on HPC platforms used in real‑world, mission‑critical environments, not proofs of concept!

Requirements

  • BS or MS in Computer Science, Computer Engineering, or a related field
  • 5+ years of progressive experience in HPC systems, storage, or large‑scale Linux infrastructure
  • Deep, hands‑on expertise in HPC storage and Linux‑based infrastructure
  • Strong, distro‑agnostic Linux experience (Rocky, RHEL, SuSE, Ubuntu)
  • Proven experience designing and operating large‑scale parallel storage systems
  • Strong understanding of HPC hardware platforms (servers, GPUs, networking, storage, BIOS/BMC)
  • Advanced Linux systems knowledge (PXE/netboot, systemd, HA concepts)
  • Solid networking fundamentals (TCP/IP, DNS, DHCP, LDAP, HTTP)
  • Strong scripting skills in Shell and Python
  • Experience with configuration management and automation (Salt, Puppet, Chef, etc.)
  • Ability to lead complex work independently while influencing cross‑functional teams
  • Requires minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or a PhD with 3 years experience; or equivalent experience.

Nice To Haves

  • Strong DevOps and automation mindset (CI/CD pipelines, Git, infrastructure as code)
  • Experience with containers for HPC (Singularity, Docker)
  • Monitoring and observability experience (Prometheus, Grafana)
  • Familiarity with Apache/Nginx and supporting infrastructure services

Responsibilities

  • Own the design, implementation, and ongoing support of high‑performance compute (HPC) clusters, taking accountability for system performance, reliability, and scalability
  • Serve as a technical authority for HPC storage, with deep hands‑on expertise in parallel file systems such as Lustre, GPFS, and BeeGFS
  • Apply advanced systems knowledge across CPU and GPU architectures, high‑bandwidth interconnects, and robust storage subsystems to deliver balanced, high‑performance solutions
  • Lead the creation of hardware BOMs for HPC clusters, working directly with vendors and coordinating hardware release activities
  • Design, configure, and optimize Linux operating systems for HPC environments, applying strong, distro‑agnostic Linux expertise
  • Translate project specifications and performance requirements into subsystem‑ and system‑level designs, driving execution while meeting technical and schedule commitments
  • Support the design, release, and transition of new systems to manufacturing and customers, providing high‑quality golden images, procedures, scripts, and documentation
  • Manage EOL part re‑qualification activities to ensure long‑term system viability and supportability
  • Act as a senior escalation point for complex in‑house and in‑field issues, providing hands‑on troubleshooting and resolution

Benefits

  • medical
  • dental
  • vision
  • life
  • 401(K) including company matching
  • employee stock purchase program (ESPP)
  • student debt assistance
  • tuition reimbursement program
  • development and career growth opportunities and programs
  • financial planning benefits
  • wellness benefits including an employee assistance program (EAP)
  • paid time off
  • paid company holidays
  • family care and bonding leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service