The CoreHPC team at UCSF is seeking an HPC Systems Engineer to play a key role in the development, maintenance, and day-to-day operations of the Institute’s HPC clusters. The HPC Systems Engineer will apply advanced systems infrastructure concepts and skills to the operations and improvement of large-scale and highly complex research Cyber Infrastructure (CI) with unique computing, networking, and storage systems designed to address cutting-edge research problems. They will apply their engineering and design skills to develop new CI solutions, and to develop and enhance monitoring to maintain the integrity of CI systems. They will select methods, techniques and evaluation criteria to develop new CI solutions to address complex research problems. This role involves being an active member of the support and maintenance efforts for the CoreHPC cluster, resolving user issues, fixing technical problems, resolving outages, patching, and maintaining systems' uptime and availability. The position also includes providing consultation, support, and guidance to researchers on how to address computational problems using standard tools, packages, and approaches, and developing enhancements of monitoring to maintain the integrity of CI systems. The engineer will participate in multiple technical projects simultaneously, apply working knowledge of security control frameworks to maintain the integrity of the CI systems and the research being performed on them, and give presentations to the associated team and other technical units. Additionally, they will evaluate new technologies, including performing moderate to complex cost/benefit analyses. This position may lead to cross-functional technical working groups and projects in support of onboarding research customers, or making systems improvements.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed