HPC Storage Systems Group Leader

Lawrence Berkeley National LaboratoryBerkeley, CA
Hybrid

About The Position

The National Energy Research Scientific Computing Center (NERSC) is seeking a knowledgeable and inspired group leader for the Storage Systems Group (SSG). This role is responsible for developing NERSC’s storage strategy, aligning with the systems roadmap, science workflows, and user needs. The SSG Lead will provide vision and guidance for the design, operation, and simplification of the storage environment for NERSC’s 11,000+ users. The SSG manages NERSC’s storage portfolio, including large-scale parallel file systems and archival storage, balancing performance, stability, and usability across various DOE mission areas and scientific domains. The Lead provides technical leadership to a team of skilled storage engineers, fostering collaboration to deliver innovative solutions and a technical vision for NERSC's future storage platforms. The current storage environment includes a hierarchical storage management system (HPSS) storing over 450 PB and a large-scale parallel community file system (Storage Scale) with over 150 PB of online storage. The SSG will also manage scratch and new quality of service storage systems for Doudna, NERSC’s next-generation GPU-based supercomputer operational in 2027. The Lead will investigate new storage technologies, engage with vendors, and work with the Data Center Department Head to align group priorities with NERSC’s strategic plan.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Applied Mathematics, Computational Science (or related fields) and current applicable systems support and engineering experience, plus a minimum of 3 years of experience in a managerial role of complex computer systems, storage or networking unit.
  • Experience with storage technologies in a Linux environment, such as InfiniBand, RoCE, SAN/NAS, NFS, pNFS, hierarchical storage management systems (such as HPSS), Lustre, Storage Scale, VAST, and object stores.
  • Prior experience with HPC applications, workflows and computational and storage systems.
  • Experience in managing and supporting a 24/7 IT environment.
  • Ability to mentor staff to increase their knowledge and skills.
  • Deep and broad knowledge of storage technologies such as parallel filesystems (i.e. Storage Scale), hierarchical storage management (i.e. HPSS), distributed storage systems (i.e. VAST), and storage networking (i.e. InfiniBand or RoCE).
  • Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active intellectual environment.
  • Ability to gather requirements from the scientific user community and turn requirements into system characteristics.
  • Strong technical and collaboration skills needed to create and deploy innovative ways of allowing our diverse user base to effectively utilize the unique resources that NERSC provides.
  • Understand balancing technical solutions with user needs and show initiative, tact and good judgment in developing solutions to problems.
  • Excellent written and verbal communication skills.

Nice To Haves

  • A Master’s or PhD degree in related fields.
  • Knowledge of object storage and non-volatile storage technologies.
  • Experience administering and deploying storage systems of tens of petabytes (or greater) scale in a HPC environment.

Responsibilities

  • Develop NERSC’s storage strategy based on NERSC’s systems roadmap, science workflows and user needs.
  • Lead a team that procures, installs, manages, supports and monitors NERSC’s large scale storage systems, including providing 24x7 support.
  • Ensure NERSC’s storage systems meet the needs of NERSC’s 11,000 users by providing high performing, available, and usable systems.
  • Work independently and as part of the Storage Systems Group to diagnose and fix storage problems, help analyze storage system issues, and develop and implement workarounds and/or patches for software bugs.
  • Provide effective line management to a group of approximately 10 Computer Systems Engineers by hiring excellent staff and working closely with SSG staff members. Ensure staff are meeting goals, provide both positive and constructive feedback to staff and ensure all staff have career growth opportunities.
  • Provide technical leadership for implementation and deployment efforts for storage system improvements that enhance task automation, reliability, stability, usability, performance, and security.
  • Continuously evaluate new storage technologies and make recommendations on future storage strategy and directions for the center, including both parallel and hierarchical storage, that would create new capabilities and enhance storage and HPC system performance and usability.
  • Work closely with other teams at NERSC to enable large-scale simulation, data analysis and AI applications to run on NERSC supercomputing and storage systems.
  • Provide budgetary input and oversight for NERSC’s storage systems.
  • Lead or collaborate efforts with other Department of Energy (DOE) Labs on future storage technologies, multi-lab storage efforts and other related topics.
  • Present at conferences and talks to promote NERSC to other national labs and HPC sites.
  • Create and develop a vision and strategy for the group and be a key part of NERSC’s management team.

Benefits

  • Exceptional health and retirement benefits, including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you’ll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service