Senior HPC Engineer

Texas A&M University SystemCollege Station, TX
Onsite

About The Position

We are making a bold leap into the future of artificial intelligence with a $45 million investment in an NVIDIA DGX SuperPOD. This investment underscores our commitment to all Texas A&M System members’ faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer (HPC), you will provide technical expertise and consultation for the design and deployment of HPC systems. Get in on the ground floor with a team that is shaping the next generation of innovation. This position is security sensitive requiring U.S. Citizenship.

Requirements

  • Bachelor’s degree in applicable field or equivalent combination of education and experience
  • 12 years of related experience
  • Experience with High Performance Computing (HPC) environments
  • Advanced Linux system administration skills
  • Familiarity with computer networking concepts and protocols
  • Experience with container orchestration tools such as Kubernetes
  • Knowledge of Run:ai for AI workload management
  • Proficiency with Slurm workload manager
  • Experience working with NVIDIA DGX systems
  • Understanding of virtualization technologies
  • Familiarity with Infrastructure as a Service (IaaS) platforms
  • Experience with DDN storage solutions
  • Knowledge of network-attached storage systems
  • Expertise in scalable supercomputing architectures, interconnects, and storage systems.
  • Proficiency in scripting (Python, Bash, Perl) and scientific computing (MPI, OpenMP, CUDA).
  • Experience with configuration management tools (Ansible, Puppet).
  • Familiarity with container technologies (Docker, Singularity, Kubernetes).
  • Strong troubleshooting, communication, and strategic planning skills.
  • Must be a United States citizen, permanent resident, or a person granted asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22 and 123.26; and 31 CFR § 501.601
  • All positions are security-sensitive.
  • Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check.

Responsibilities

  • Manage large-scale HPC cluster operations, including OS upgrades, firmware patching, and performance tuning.
  • Oversee networking, security, and infrastructure for HPC systems.
  • Lead the development of specialized HPC computing clouds and scalable storage systems.
  • Collaborate with stakeholders to develop service-based solutions.
  • Serve as a strategic technical resource across departments.
  • Lead enterprise-wide HPC projects using established project management protocols.
  • Mentor junior system administrators and enforce performance standards.

Benefits

  • Benefit Programs
  • Retirement
  • Employee Discount Program
  • Flexible Spending Accounts
  • University Holidays
  • New Employee Onboarding
  • Prospective Employees
  • Safety and Security Notices
  • Training and Development
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service