High-Performance Computing (HPC) Systems Administrator

Mass General BrighamBoston, MA
Hybrid

About The Position

The Martinos Center for Biomedical Imaging at Massachusetts General Hospital seeks a dedicated and highly motivated High-Performance Computing (HPC) Systems Administrator (Sysadmin) to oversee and optimize the center's HPC cluster, a core computational resource supporting cutting-edge biomedical and neuroimaging research. The HPC Sysadmin will play a critical role in maintaining and enhancing the cluster's performance, supporting researchers in their computational workflows, and ensuring the scalability and reliability of the system. This role is ideal for an individual with strong experience in HPC systems administration, an understanding of scientific computing needs, and the ability to work collaboratively with researchers from diverse disciplines. Work Environment This position is based at the Martinos Center for Biomedical Imaging in the Charlestown Navy Yard. This position offers a hybrid work environment, allowing for a combination of remote work and on-site responsibilities. The candidate must be located within a commutable distance to Charlestown, MA, and be available to attend regular in-person meetings with the Center’s Faculty and Leadership. Why Join Us? • Work in a multidisciplinary environment supporting groundbreaking research in computational methods, neuroscience, cancer, and cardiovascular health. • Operate a state-of-the-art HPC cluster in collaboration with world-class researchers and scientists. • Be part of a team dedicated to pushing the boundaries of technology in biomedical imaging.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
  • 3+ years of experience in HPC systems administration or equivalent.
  • Strong expertise in Linux systems administration (e.g., CentOS, RHEL, Ubuntu) in an HPC environment.
  • Experience with job scheduling using Slurm.
  • Proficiency in HPC-related programming and scripting languages (e.g., Bash, Python, Perl).
  • Familiarity with parallel computing, distributed systems, and scientific computing frameworks.
  • Hands-on experience with storage systems, networking, and security in an HPC environment.
  • Excellent interpersonal and communication skills to interact with researchers and non-technical staff, and previous experience working with researchers
  • Demonstrated ability to adapt to changing technologies, workflows, and priorities in a dynamic research environment.
  • Strong organizational and time-management skills to efficiently manage multiple concurrent projects and tasks.

Nice To Haves

  • Advanced degree in Computer Science, Engineering, or a related field.
  • Knowledge of biomedical or neuroimaging applications and related software (e.g., FreeSurfer, FSL, SPM, ANTs, MATLAB).
  • Experience with machine learning workflows and GPU-based computing (e.g., PyTorch, CUDA, TensorFlow).
  • Familiartiy with data-intensive workflows and large-scale storage systems.

Responsibilities

  • Cluster Management: Oversee the day-to-day operations, maintenance, and optimization of the Martinos Center's HPC cluster, ensuring high availability, reliability, and performance.
  • Perform hardware and software upgrades, patching, and troubleshooting of HPC nodes, storage, and networking.
  • User Support: Provide technical support and guidance to researchers and staff using the HPC cluster for computational tasks, such as neuroimaging, machine learning, and data analysis.
  • Assist users with job scheduling, resource allocation, and troubleshooting.
  • System Monitoring and Performance Optimization: Develop and implement robust monitoring tools to track resource utilization and identify performance bottlenecks.
  • Analyze workloads and provide recommendations for optimization of computational workflows.
  • Collaboration and Training: Collaborate with researchers to understand their computational needs and assist in designing tailored HPC solutions for their projects.
  • Develop training materials and lead workshops to educate researchers on best practices for using the cluster.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service