Senior HPC Engineer

Texas A&M University SystemCollege Station, TX
24d$125 - $136Onsite

About The Position

We are making a bold leap into the future of artificial intelligence with a $45 million investment in an NVIDIA DGX SuperPOD. This investment underscores our commitment to all Texas A&M System members’ faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer (HPC), you will provide technical expertise and consultation for the design and deployment of HPC systems. Get in on the ground floor with a team that is shaping the next generation of innovation. This position is security sensitive requiring U.S. Citizenship.

Requirements

  • Bachelor’s degree in applicable field or equivalent combination of education and experience
  • 12 years of related experience

Nice To Haves

  • Experience with High Performance Computing (HPC) environments
  • Advanced Linux system administration skills
  • Familiarity with computer networking concepts and protocols
  • Experience with container orchestration tools such as Kubernetes
  • Knowledge of Run:ai for AI workload management
  • Proficiency with Slurm workload manager
  • Experience working with NVIDIA DGX systems
  • Understanding of virtualization technologies
  • Familiarity with Infrastructure as a Service (IaaS) platforms
  • Experience with DDN storage solutions
  • Knowledge of network-attached storage systems

Responsibilities

  • Manage large-scale HPC cluster operations, including OS upgrades, firmware patching, and performance tuning.
  • Oversee networking, security, and infrastructure for HPC systems.
  • Lead the development of specialized HPC computing clouds and scalable storage systems.
  • Collaborate with stakeholders to develop service-based solutions.
  • Serve as a strategic technical resource across departments.
  • Lead enterprise-wide HPC projects using established project management protocols.
  • Mentor junior system administrators and enforce performance standards.

Benefits

  • Health, dental, vision, life and long-term disability insurance with Texas A&M contributing to employee health and basic life premiums
  • 12-15 days of annual paid holidays
  • Up to eight hours of paid sick leave and at least eight hours of paid vacation each month
  • Automatically enrollment in the Teacher Retirement System of Texas
  • Health and Wellness: Free exercise programs and release time
  • Professional Development: All employees have access to free LinkedIn Learning training, webinars, and limited financial support to attend conferences, workshops, and more
  • Educational release time and tuition assistance for completing a degree while a Texas A&M employee
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service