HPC User Support Engineer

Argonne National LaboratoryLemont, IL
1d$86,299 - $166,070Remote

About The Position

As a member of the User Experience team at the Argonne Leadership Computing Facility (ALCF), the HPC User Support Engineer will play a critical role in providing technical support for ALCF’s high-performance computing (HPC) systems and services. This role focuses on ensuring a seamless user experience for researchers utilizing ALCF resources. The successful candidate will work closely with leading scientific researchers from the Department of Energy complex, academia, and industry to resolve technical issues, create user documentation, and deliver training sessions. This position also involves stewarding users through ALCF allocation programs and improving workflows to enhance system usability and efficiency. Key responsibilities include, but are not limited to: Managing and resolving technical issues; debugging, installing, compiling, and running large-scale user applications Supporting users in writing scripts for automated execution and optimizing workflows Assisting with job scheduling, debugging, and troubleshooting HPC-related challenges Collaborating with ALCF domain experts to provide resolutions to user requests and technical issues Developing and maintaining documentation, both internally and on user facing websites Conducting training sessions and onboarding new users to ensure effective utilization of ALCF resources Enabling secure access to HPC systems and ensuring compliance with ALCF policies Providing support in AI technologies, including machine learning frameworks, and assisting users in deploying and optimizing AI workflows on HPC systems This position is eligible for fully remote

Requirements

  • Experience working in an HPC center supporting user codes
  • Experience working with parallel codes using MPI implementations and openMP
  • Experience with common machine learning frameworks
  • Experience with source code management systems like Git, and CI tools like Jenkins or GitLab
  • Experience with DBMS
  • Experience with containerization
  • Experience developing technical training documentation for users
  • Strong interest in emerging technologies and applications
  • Ability to work on multiple concurrent projects efficiently and effectively
  • Highly motivated and user focused
  • Ability to model Argonne's core values of impact, safety, respect, integrity, and teamwork.
  • To perform the essential functions of this position successful applicants must provide proof of U.S. citizenship, which is required to comply with federal regulations and contract.
  • PT3: Bachelors and 4+ years of experience, or a Masters and 2+ years of experience, or equivalent.
  • PT4: Bachelors and 6+ years of experience, or Masters and 4+ years of experience, or equivalent

Responsibilities

  • Managing and resolving technical issues
  • Debugging, installing, compiling, and running large-scale user applications
  • Supporting users in writing scripts for automated execution and optimizing workflows
  • Assisting with job scheduling, debugging, and troubleshooting HPC-related challenges
  • Collaborating with ALCF domain experts to provide resolutions to user requests and technical issues
  • Developing and maintaining documentation, both internally and on user facing websites
  • Conducting training sessions and onboarding new users to ensure effective utilization of ALCF resources
  • Enabling secure access to HPC systems and ensuring compliance with ALCF policies
  • Providing support in AI technologies, including machine learning frameworks, and assisting users in deploying and optimizing AI workflows on HPC systems

Benefits

  • comprehensive benefits are part of the total rewards package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service