HPC Systems Engineer (Remote)

TEKsystemsLittleton, CO
$60 - $75Remote

About The Position

TEKsystems is seeking an experienced HPC Systems Engineer to support the design, deployment, optimization, and ongoing operations of high‑performance computing (HPC) environments. This role supports advanced engineering and simulation workloads and requires strong Linux systems expertise, parallel computing knowledge, and hands-on experience operating and tuning HPC clusters. After an initial one-week onsite onboarding in Orlando, FL, this role will be fully remote.

Requirements

  • Minimum 3 years of professional experience in HPC-focused software or systems engineering
  • Strong hands‑on experience administering Linux systems (RHEL preferred)
  • Proficiency in one or more of the following: C, C++, Python, Bash, Ansible
  • Working knowledge of parallel computing models and frameworks, including MPI, OpenMP, CUDA
  • Experience with: HPC cluster deployment and administration
  • Job scheduling and resource managers
  • High‑performance networking (InfiniBand)
  • Demonstrated experience installing, configuring, troubleshooting, and tuning HPC workloads
  • Understanding of ITIL concepts related to incident, service, and change management
  • Strong analytical and problem‑solving skills
  • Excellent communication skills and the ability to adapt in a fast‑paced environment

Nice To Haves

  • Experience supporting engineering simulation or scientific computing workloads
  • Familiarity with performance profiling and benchmarking tools
  • Prior experience supporting enterprise or customer-facing HPC environments
  • Exposure to hybrid or cloud‑adjacent HPC solutions

Responsibilities

  • Design, deploy, administer, and support HPC systems running on Linux (RHEL) platforms
  • Install, configure, troubleshoot, and performance‑tune HPC clusters supporting engineering and simulation workloads
  • Support and optimize parallel computing applications using MPI, OpenMP, CUDA, and related frameworks
  • Configure and manage cluster management and job scheduling tools (e.g., Slurm, PBS, LSF)
  • Support and troubleshoot high‑speed interconnects, including InfiniBand
  • Develop and maintain automation and operational tooling using C, C++, Python, Bash, and Ansible
  • Perform root cause analysis and participate in structured incident, problem, and change management processes aligned with ITIL practices
  • Work closely with architects, developers, and customers to ensure system stability, performance, and scalability
  • Produce clear technical documentation and communicate effectively with both technical and non-technical stakeholders

Benefits

  • Medical, dental & vision
  • 401(k)/Roth
  • Insurance (Basic/Supplemental Life & AD&D)
  • Short and long-term disability
  • Health & Dependent Care Spending Accounts (HSA & DCFSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Health Spending Account (HSA)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service