High Performance Computing Systems Administrator

University of AkronAkron, OH
21hOnsite

About The Position

Job Summary: Provide systems programming and management functions for high performance computing systems maintained both on premises and through cloud HPC service providers and in high-performance research computing environment. Work as member of systems team on the administration, integration and maintenance of parallel high-performance computing systems, clusters, as well as other systems and peripherals, including advanced file systems, enterprise storage systems, visualization environments, and networks. Complex system integration, deployment, and administration projects, system performance analyses, problem resolution, and system security initiatives. Work with senior staff on the development of system management strategies, architectural assessments, system tools, and software for the administration of the high-performance computing systems. Provide technical assistance and consultation for faculty, researchers, students, and technical staff on the use of high-performance computing platforms. Duties are split between management of externally funded and University of Akron-owned high performance computation resources. Essential Functions: 30% Provide systems support for advanced research computing environment, to include the installation, integration and management of high-performance computer systems, clusters, operating systems, peripherals, and system interfaces; monitors system usage; ensures that the high-performance computing complex is operating at optimal performance and reliability levels; additional duties include consulting, training and the development and maintenance of systems documentation. 30% Monitor hardware, software, virtual infrastructure and on-premise and cloud based HPC applications. Monitor externally funded computer resources. Identify and correct problems. Generate and analyze usage reports and system configurations for performance tuning and capacity planning. Assist in planning and maintaining data center facilities and externally funded resources. Manage backup services. 20% Work with users and other computational professionals in evaluating user requirements, and in the configuration and deployment of computational resources. Participates in the configuration and tuning of batch queuing systems in a massively parallel production environment; collects parallel system utilization statistics; identifies and resolves computer system anomalies and operational problems; and provides systems support and file sharing services. Act as primary liaison for external funding agencies and provide technical support, training and guidance as required. 20% Maintain an understanding of state-of-the-art computing systems and peripherals; computer operating systems; and scalable, parallel architectures. Research and evaluate new and emerging technologies. Additional Position Information: Education: Requires some College Courses or High School Diploma and training in related field. Prefers a relevant Bachelor's Degree. Licenses/Certifications/Requirements: None. Experience: Requires a minimum of 2 years’ experience in server operating systems (preferably Unix/Linux), server management, computer system development, networking protocols and programming (preferably bash, python). Highly developed problem solving, communication, and technical writing skills required. Ability to adapt to new technology and maintain currency in technical knowledge is required. Will need to be on site as position requires work with physical equipment.

Requirements

  • Requires some College Courses or High School Diploma and training in related field.
  • Requires a minimum of 2 years’ experience in server operating systems (preferably Unix/Linux), server management, computer system development, networking protocols and programming (preferably bash, python).
  • Highly developed problem solving, communication, and technical writing skills required.
  • Ability to adapt to new technology and maintain currency in technical knowledge is required.
  • Will need to be on site as position requires work with physical equipment.

Nice To Haves

  • Knowledge of SLURM schedulding software
  • Familiarity with spack or similiar system for package management, as well as openhpc / warewulf for provisioning.
  • Comfortable with multiple distributions of linux (ubuntu, rocky, centos, oracle, etc.)
  • Comfortable with infiniband networking equipment and software.
  • Familiar with Nvidia CUDA programming.

Responsibilities

  • Provide systems support for advanced research computing environment, to include the installation, integration and management of high-performance computer systems, clusters, operating systems, peripherals, and system interfaces
  • Monitors system usage
  • Ensures that the high-performance computing complex is operating at optimal performance and reliability levels
  • Consulting, training and the development and maintenance of systems documentation
  • Monitor hardware, software, virtual infrastructure and on-premise and cloud based HPC applications
  • Monitor externally funded computer resources
  • Identify and correct problems
  • Generate and analyze usage reports and system configurations for performance tuning and capacity planning
  • Assist in planning and maintaining data center facilities and externally funded resources
  • Manage backup services
  • Work with users and other computational professionals in evaluating user requirements, and in the configuration and deployment of computational resources
  • Participates in the configuration and tuning of batch queuing systems in a massively parallel production environment
  • Collects parallel system utilization statistics
  • Identifies and resolves computer system anomalies and operational problems
  • Provides systems support and file sharing services
  • Act as primary liaison for external funding agencies and provide technical support, training and guidance as required
  • Maintain an understanding of state-of-the-art computing systems and peripherals; computer operating systems; and scalable, parallel architectures
  • Research and evaluate new and emerging technologies

Benefits

  • The University of Akron offers a competitive total compensation package comprised of a competitive salary and comprehensive benefits for eligible employees including medical, dental, vision, short and long-term disability, life insurance, and paid leave of absences including time off for illness, vacation, and maternity or paternity leave.
  • In addition, eligible employees and their dependents are provided tuition remission.
  • All staff and eligible non-bargaining unit faculty have the option to request a Flexible Work Arrangement (FWA).
  • The University of Akron participates in state retirement systems and offers alternative retirement options with competitive employer contributions.
  • Optional investment opportunities are available including deferred compensation programs (403(b) and 457(b)).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

High school or GED

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service