HPC System Administrator

Santa Clara UniversitySanta Clara, CA
1d$129,000 - $161,265

About The Position

The High-Performance Computing (HPC) System Administrator is an expert, hands-on role responsible for the design, configuration, optimization, and operation of the organization's high-performance computing infrastructure. This individual will focus on advanced system optimization, complex troubleshooting, and strategic planning for future infrastructure enhancements across compute, storage, and high-speed interconnects (InfiniBand). A key responsibility is to mentor and cross-train existing system administrators, building the team’s collective HPC expertise, strengthening shared support capabilities, and ensuring long-term operational resilience and efficiency. The HPC Systems Administrator is a member of the Enterprise Systems team within the Cyberinfrastructure Technologies department. The incumbent works with the other Cyberinfrastructure teams - Network and Telecommunications, Enterprise Applications, and the Information Security Office - and other campus divisions in coordinating services, providing support and providing appropriate guidance. This incumbent will also work with University vendors and partners. The HPC Systems Administrator will have a passion for providing excellent customer service, and a focus on continual improvement across all units; a commitment to supporting innovative infrastructure technologies; and a desire to identify and deliver the best possible technology resources and services to meet the needs of the campus community.

Requirements

  • General Knowledge of information technology, campus technology, and information security issues and trends in higher education, and ability to continually develop new knowledge regarding the same.
  • Ability to listen and understand customer needs.
  • Ability to plan, implement, and evaluate customer service initiatives.
  • Ability to work in a collaborative environment, as either a member or leader of a team, to meet deadlines and achieve goals.
  • Ability to manage a diverse workforce to provide excellent customer service.
  • Self-motivated and shows initiative.
  • Ability to successfully manage multiple projects simultaneously.
  • Proven track record in project planning and project management.
  • Ability to exercise independent judgment and engage in critical thinking and problem solving.
  • Ability to work effectively under pressure in a busy (sometimes chaotic) and demanding information services environment.
  • Ability to explain technical issues and policies to non-technical people.
  • Ability to give presentations on technical issues to a broad range of audiences.
  • Ability to foster and maintain good working relationships with faculty, administrators, students, senior management, and other leaders.
  • Ability to handle sensitive matters with diplomacy and the ability to mediate between competing parties.
  • Ability to maintain confidentiality and manage confidential information.
  • Must possess impeccable integrity.
  • Ability to speak truth to power.
  • Appreciation for the University’s mission, vision, values, priorities, procedures, and policies.
  • Position-specific Knowledgeable and experienced in large-scale computer center operations with multiple systems running Linux and Windows with Server operating systems
  • Experience with managing and operating SAN storage environments
  • Strong proficiency in the management of multi-platform hardware and software environments including Microsoft, Linux (Red Hat).
  • Experience with configuration management tools such as Ansible and Warewulf
  • Experience with Slurm and job scheduling
  • Strong proficiency with scripting languages (Python, Bourne Shell, Perl, etc.)
  • Experience with compiling software packages and managing software modules in a HPC (EasyBuild, Lmod)
  • Experience with racking servers and adding PCI cards
  • Experience with LDAP and DNS
  • Experience with parallel file systems
  • Experience with Infiniband networks
  • The University technology environment is very dynamic and challenging. A person with a wide breadth of experience and who can adapt to changes working in a complex technology infrastructure environment is sought.
  • Experience with vSphere, ESXi
  • Experience in using and configuring system monitoring tools
  • Experience with enterprise Backups
  • Experience with cloud providers
  • Skilled technical troubleshooter. Must be able to analyze and solve complex problems.
  • Knowledgeable in the use of a personal computer and standard productivity tools
  • Experience interacting and working with other people in a successful customer service capacity
  • Industry trends in enterprise infrastructure/data center technology including: automation tools, cloud technology, disaster recovery, virtualization, networking, security and other pertinent areas.
  • Experience with Identity and Access Management (IAM)
  • Excellent interpersonal, written and verbal communication skills
  • Demonstrated ability to work in a collaborative, team environment
  • Strong organizational skills and ability to multitask
  • Must be a “self-starter” and show initiative to proactively identify and resolve problems
  • Must have the ability to acquire and apply new skills quickly
  • Strong customer service orientation
  • Understands the role of enterprise computing in University business processes
  • Works under limited supervision
  • Bachelor’s degree in a directly applicable field of study (Computer or Electrical Engineering, Math/Computer Science, Operations and Management Information Science)
  • 8+ years applicable experience in the operation, maintenance, support and design of enterprise-wide computer center systems with demonstrated increasing responsibilities
  • 2+ years of experience supporting an HPC required, including experience in Slurm or similar workload manager; InfiniBand or similar high speed interconnect; and Lustre or similar parallel file system.

Nice To Haves

  • Advanced Degree preferred in directly applicable field of study or a field of management
  • Experience working for the needs of Higher Education or research organizations is desirable

Responsibilities

  • HPC Infrastructure Management and Optimization
  • Workload Management and System Deployment
  • Team Development and Strategic Planning
  • Coordination and Collaboration
  • Resource Planning
  • Service Delivery
  • Service Optimization
  • Communication
  • Operations
  • Other duties as assigned by the Manager of Enterprise Systems and IS leadership.

Benefits

  • Santa Clara University offers a comprehensive benefits package for benefit eligible employees with programs and resources designed to promote and sustain personal health care, well-being, and the financial objectives of our employees and families.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service