HPC System Administrator -3 (HPC servers, HPC clusters, SPD servers)

AkinaAnnapolis, MD
20h$180,000 - $220,000

About The Position

System Administrators (HPC), must provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities include: Multi-vendor HPC servers, HPC clusters, and SPD servers. Systems running Red Hat, CentOS, SUSE and custom vendor-specific operating systems, with high-speed shared storage (lustre and gpfs as examples), along with dedicated high-speed low latency network interconnects like Infiniband and Slingshot. High speed shared parallel storage utilizes LUSTRE to provide performant shared storage solutions between two or more HPCs in a data center. An Interconnect service integrates HPC systems with a dedicated high-speed network connecting several storage appliances to dedicated HPC LNETs. These appliances would be available to various HPCs to enhance capabilities. The HPC Operations Team must provide for implementing and managing monitoring capability required to track the health, status, and performance of the entire system to include its subcomponents (environmental, compute, storage, networks and applications) using various COTS and GOTS toolsets (such as Nagios, Splunk, Prometheus, etc.) System Administrators (HPC) must support The HPC and ABS (ABUNDANTSHIELD high speed shared parallel storage) SRE teams and follow Government designated policies and procedures, developed to enhance the teams’ ability to perform their sustainment responsibilities and to improve customer mission operations. The contractor must follow those Government designated policies and procedures, which include ticket tracking processes, change approvals and change management processes, coding specifications, and the support (and adherence) to processes and policies associated with the Government designated (and deployed) base SRE tools.

Requirements

  • B.S. in a technical discipline and 10 years’ experience as a System Administrator in programs and contracts of similar scope, type and complexity or 15 years’ experience in lieu of degree.
  • Proficient with the following (as specific position requires):
  • DoD 8570 IAT II level certification required.
  • TS/SCI with FSP required.
  • Most recent poly in the last 7 years required.

Responsibilities

  • Provide support for implementation, troubleshooting and maintenance of IT systems
  • Provide Tier 1 (Help Desk) problem identification, diagnosis and resolution of problems
  • Manage the daily activities of configuration and operation of IT systems
  • Provide assistance to users in accessing and using IT systems
  • Provide Tier 1 (Help Desk) and Tier 2 (Escalation) problem identification, diagnosis and resolution of problems
  • Provide support to IT systems including day-to-day operations, monitoring and problem resolution for all of the client/server/storage/network devices, mobile devices, etc.
  • Provide support for the escalation and communication of status to agency management and internal customers
  • Optimize system operations and resource utilization, and perform system capacity analysis and planning
  • Provide in-depth experience in trouble-shooting IT systems
  • Provide detailed analysis and feedback to agency management and internal customers for escalated tickets
  • Provide support for the dispatch system and hardware problems and remains involved in the resolution process
  • Configure and manage Linux, Unix, and Windows (or other applicable) operating systems and installs/loads operating system software, troubleshoot, maintain integrity of and configure network components, along with implementing operating systems enhancements to improve reliability and performance
  • Support the design of systems, mission architecture and associated hardware
  • Possess a working knowledge and understanding of system administration interdependencies as part of the Service Oriented Architecture (SOA)
  • Analyze and resolve complex problems associated with server hardware, applications and software integration

Benefits

  • 24 days PTO accrued annually and 11 federal holidays
  • Our 401k is 100% vested on your start date and the company makes a direct contribution worth 10% of your salary.
  • Akina covers 100% of healthcare costs for employees and 50% toward dependents.
  • We offer educational assistance towards college classes and will cover costs associated with job related training and certifications
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service