HPC Storage Systems Engineer

Lawrence Berkeley National LaboratoryBerkeley, CA
13h$218,364Hybrid

About The Position

NERSC’s mission is to accelerate scientific discovery through high performance computing and data analysis for the DOE Office of Science programs. NERSC provides critical HPC and data systems and support for NERSC’s 11,000+ users researching high energy physics, materials sciences, chemistry, fusion, and other DOE mission areas. The National Energy Research Scientific Computing Center (NERSC) is inviting applications for the position of Storage System Administrator. The HPC Storage System Administrator position is focused on extreme scale high performance storage. The NERSC Storage Systems Group provides petabytes of capacity and terabytes per second of bandwidth to the NERSC user community. In this role, the incumbent will work with 7-10 system engineers and programmers in the Storage Systems Group collaborating to help architect, deploy and manage NERSC’s storage hierarchy (composed of Lustre, Storage Scale (formerly GPFS) file systems, VAST and two HPSS tape archive systems). We seek an experienced, motivated high performance storage administrator who has broad knowledge of storage hardware and software technologies, in particular, hierarchical storage management systems and object stores. At the CSE-3 level, the incumbent will be responsible for operating and maintaining the NERSC mass storage service (HPSS) as part of a team, as well as contributing to the NERSC Storage Strategy. At the CSE-4 level, the incumbent will lead the HPSS effort with a few key Storage System engineers under their direction. In this role, you will be responsible for the day-to-day administration of NERSC’s HPSS service and contribute significantly to the development and execution of NERSC’s Storage Strategy, particularly with respect to mass storage. The selected candidate(s) will be hired at the Computer Systems Engineer 3 or 4 (CSE3 or CSE4) depending on their level skills and experience.

Requirements

  • Bachelor’s degree or equivalent experience and a minimum of 8 years of computing or storage experience; or 6 years and a Master’s degree; or equivalent experience
  • Wide-ranging expertise in the areas of mass storage solutions (such as HPSS) and storage networking technologies (such as RDMA, RoCE, Infiniband and Fibre Channel).
  • Experience managing storage systems
  • Excellent technical troubleshooting skills with the ability to resolve complex issues in creative and effective ways
  • Knowledge of trends in storage system hardware and software
  • Strong communication skills, and the ability to work independently and collaboratively as part of a creative and diverse team
  • Ability to script in Python, Perl, Shell or other interpreted language
  • Knowledge of block storage arrays, storage networks, parallel file systems, hierarchical storage systems and object stores
  • Ability to resolve complex issues in creative and effective ways.
  • Ability to network and collaborate with key contacts outside of their own area of expertise
  • Excellent oral and written communication skills
  • Demonstrated ability to work effectively as part of a cross-disciplinary team
  • Bachelor’s degree or equivalent experience and a minimum of 12 years of computing or storage experience; or 8 years and a Master’s degree; or equivalent experience
  • Broad expertise and/or unique knowledge in the areas of mass storage solutions (such as HPSS), storage networking technologies (such as RDMA, RoCE, Infiniband and Fibre Channel), NFS, storage tiering, and storage performance tuning.
  • Experience providing direction to a project team, or leading a team of systems or storage administrators
  • Experience architecting storage system solutions to meet user requirements
  • Experience administering or developing HPSS, Versity or other hierarchical storage management systems
  • Experience troubleshooting high performance data transfer applications
  • Experience with an automated software provisioning and configuration management system
  • Understanding of file system internals, or prior work developing storage systems
  • Good understanding of data transfer protocols, for example TCP/IP, IB verbs or ROCE
  • Knowledge of typical Unix file system structure
  • Ability to work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles.
  • Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.

Nice To Haves

  • Present technical information at conferences and meetings

Responsibilities

  • Participate in projects to architect, deploy and manage NERSC’s mass storage hierarchy
  • Contribute to the effort to manage and maintain the HPSS systems
  • Day to day administration of tape-based complex storage systems
  • Analyze storage usage and system monitoring
  • Administration of storage servers and block storage arrays
  • Participate in the management of storage area network
  • Troubleshoot and debug problems in our production storage systems
  • Help define storage requirements for NERSC, ensuring that NERSC users’ needs are represented
  • Engage with NERSC users to identify projects which will improve data management and movement at the center
  • Identify and evaluate new storage hardware and software technologies and features
  • Participate in 24x7 on-call rotation
  • Work on and resolve complex issues where analysis of situations or data requires an in-depth evaluation of variable factors
  • Exercise judgment in selecting methods, techniques and evaluation criteria for obtaining results
  • Determine methods and procedures on new assignments and may coordinate activities of other personnel
  • Network with key contacts outside of their own area of expertise
  • Lead the mass storage system administrators team within the Storage Systems Group, leading effort to manage and maintain the HPSS systems
  • Lead projects to architect, deploy and manage NERSC’s mass storage hierarchy
  • Work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles
  • Exercise independent judgment in methods, techniques and evaluation criteria for obtaining results

Benefits

  • Exceptional health and retirement benefits, including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you’ll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service