HPC Storage Engineer | Experienced Hire

Susquehanna International Group, LLP•Bala Cynwyd (Philadelphia Area), PA

33d•Onsite

About The Position

We are looking for an experienced HPC Storage Engineer to design, implement, and optimize the storage and data movement infrastructure that underpins our high-performance computing (HPC) environment. This role focuses on distributed and parallel filesystems, storage systems, and large-scale data movement, ensuring reliable, high-throughput access to data for compute-intensive workloads. You will work closely with HPC platform engineers, compute and networking teams, and application users to deliver scalable, performant, and resilient storage solutions that tightly integrate the storage layer with compute nodes. In this role, you will: Design, deploy, and operate HPC storage systems and parallel/distributed filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS, Ceph). Own data movement workflows across environments, including data ingest, replication, tiering, and archiving. Optimize filesystem and storage performance for large-scale parallel workloads. Design and tune load-balancing strategies across storage targets, metadata services, and data movement pipelines to ensure even utilization, high throughput, and predictable performance at scale. Troubleshoot storage, I/O, and data movement issues across HPC compute clusters. Develop and maintain automation for storage provisioning, monitoring, and lifecycle management. Partner with compute and networking teams to ensure end-to-end performance and reliability. Advise users and application teams on best practices for I/O patterns, data layout, and performance tuning. Evaluate and integrate new storage technologies and architectures as requirements evolve.

Requirements

Hands-on experience with parallel or distributed filesystems in production environments
Strong understanding of Linux systems administration
Experience with high-performance I/O, data locality, and throughput optimization
Proficiency in large-scale distributed systems development, preferably in C++
Proven ability to troubleshoot complex performance and reliability issues across storage and compute stacks
Experience with data transfer and movement tools

Nice To Haves

Familiarity with object storage and hierarchical storage management (HSM)
Experience integrating storage with HPC schedulers (e.g., Slurm) and compute workflows
Background supporting scientific, ML/AI, or other data-intensive workloads

Responsibilities

Design, deploy, and operate HPC storage systems and parallel/distributed filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS, Ceph)
Own data movement workflows across environments, including data ingest, replication, tiering, and archiving
Optimize filesystem and storage performance for large-scale parallel workloads
Design and tune load-balancing strategies across storage targets, metadata services, and data movement pipelines to ensure even utilization, high throughput, and predictable performance at scale
Troubleshoot storage, I/O, and data movement issues across HPC compute clusters
Develop and maintain automation for storage provisioning, monitoring, and lifecycle management
Partner with compute and networking teams to ensure end-to-end performance and reliability
Advise users and application teams on best practices for I/O patterns, data layout, and performance tuning
Evaluate and integrate new storage technologies and architectures as requirements evolve

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume