High Performance Computing (HPC) Engineer

Federal Reserve SystemKansas City, MO
13d$110,300 - $196,800Onsite

About The Position

The Center for the Advancement of Data and Research in Economics (CADRE) supports data and computationally intensive research and analytics for staff in the Economic Research division of the Federal Reserve Bank of Kansas City and across the Federal Reserve System. Our services include multiple high performance computing environments, research data warehousing, and advanced analytical tools. We are an embedded technology team within the division of Economic Research, Regional, and Community Affairs. We are seeking an experienced High Performance Computing Engineer who can plan, implement, and maintain advanced cyberinfrastructure solutions. The ideal candidate will have deep expertise in HPC architectures, parallel computing frameworks, and scientific computing applications. You will work independently while collaborating with researchers to solve complex computational challenges that support critical economic research initiatives.

Requirements

  • Bachelor’s degree in computer science, engineering, mathematics, or related field, or equivalent combination of education and experience.
  • Minimum of 6 years of relevant experience in HPC administration and systems engineering.
  • Extensive experience with Linux operating systems (Red Hat/CentOS) in an HPC environment.
  • Strong command line skills and proficiency in scripting languages (Python, Bash).
  • Experience with job scheduling systems (SLURM, PBS, LSF) and resource management.
  • Knowledge of parallel file systems and storage technologies (e.g. ceph, GPFS, Lustre, BeeGFS).
  • Familiarity with parallel programming models (MPI, OpenMP) and scientific computing frameworks.
  • Experience with configuration management and automation tools (Salt, Ansible, Puppet).
  • Demonstrated problem-solving abilities and analytical thinking.

Nice To Haves

  • Advanced degree in a computational field.
  • Experience with cloud computing platforms and hybrid HPC environments.
  • Experience with GitLab CI/CD pipelines for research software development.
  • Understanding of GPU computing and accelerator technologies (CUDA, OpenACC).
  • Experience supporting machine learning and AI workloads on HPC systems.

Responsibilities

  • Design, deploy, configure, and administer medium scale HPC clusters and associated storage systems.
  • Monitor system health, performance metrics, and resource utilization to ensure optimal operation.
  • Implement robust security protocols and perform regular maintenance including upgrades and patching.
  • Troubleshoot complex hardware and software issues in a multi-user research environment.
  • Manage job scheduling and workload optimization using tools like SLURM.
  • Administer parallel file systems (such as ceph and IBM Spectrum Scale/GPFS) and storage solutions.
  • Design and implement innovative HPC solutions to address evolving research requirements.
  • Create and maintain automation scripts and tools to streamline system administration.
  • Optimize scientific applications and computational workflows for performance.
  • Implement container technologies (Docker, Singularity) for reproducible research.
  • Support GPU computing and accelerator technologies for specialized workloads.
  • Define and track performance metrics to ensure efficient current and future use of resources.
  • Partner closely with researchers to understand computational needs and translate them into technical solutions.
  • Collaborate with network, security, and data center teams to ensure integrated operations.
  • Build and maintain relationships with external vendors and technology partners.
  • Participate in the HPC community to stay current with emerging technologies and best practices.
  • Serve as a technical advisor on infrastructure planning and technology roadmaps.
  • Develop comprehensive documentation for systems, policies, and procedures.
  • Create user guides and training materials for researchers utilizing HPC resources.
  • Provide mentorship to junior staff and knowledge sharing across teams.
  • Conduct workshops and training sessions on effective use of HPC resources.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service