HPC Linux System Administrator

MRI TechnologiesHouston, TX
10d

About The Position

MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of JSC's primary computing facilities-over 700 machines, 26,000 cores, and 10+ petabytes of storage serving more than 1,000 users. The analyses running on FSL infrastructure support nearly every major NASA program, including International Space Station (ISS), Orion, Space Launch System (SLS), Commercial Crew, Lunar Gateway, and Human Landing System. Your responsibilities will include working with a team of System Administrators to build and maintain all FSL services. Performing High Performance Computer (HPC) and high-end Linux workstation administration. You will need to perform high speed parallel filesystem administration and high-speed parallel filesystems administration and job scheduler administration. You will be responsible for investigating problems to proactively monitor system health. You will work closely with FSL users to make sure they can support the NASA human spaceflight mission. What We Are Looking For

Requirements

  • Typically requires a bachelor's degree or equivalent certification in a related field, with a minimum of 5 years of experience
  • Linux system administration experience
  • HPC job scheduler administration experience
  • System configuration management experience
  • High-speed parallel file storage administration experience
  • Experience with monitoring and alerting systems
  • Demonstrated problem-solving, planning, and communication skills
  • Ability to work effectively in a team environment
  • Proof of U.S. Citizenship is a requirement for this position

Nice To Haves

  • Strong skills administering parallel filesystems such as Lustre or GPFS
  • Strong skills administering the SLURM job scheduling system
  • Experience with RedHat-based Linux distributions
  • Familiarity with InfiniBand high-speed networking
  • Experience with provisioning tools (xCAT, Warewulf)
  • Experience with Ansible and/or Foreman for configuration management
  • Familiarity with SPACK software package manager
  • Experience with log consolidation, monitoring, and Git/GitLab (including CI/CD pipelines)
  • Knowledge of NASA security mechanisms (security plans, POAMs, ATOs, Risk Assessments)

Responsibilities

  • working with a team of System Administrators to build and maintain all FSL services
  • Performing High Performance Computer (HPC) and high-end Linux workstation administration
  • high speed parallel filesystem administration
  • high-speed parallel filesystems administration
  • job scheduler administration
  • investigating problems to proactively monitor system health
  • work closely with FSL users to make sure they can support the NASA human spaceflight mission

Benefits

  • medical
  • dental
  • vision
  • company paid life and disability insurance
  • paid time off
  • 401(k)
  • 9/80 work schedule (every other Friday off, when applicable)
  • chance to work in one of JSC's most critical computing environments supporting human spaceflight
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service