HPC/AI Programming Environment Engineer

Lawrence Berkeley National LaboratoryBerkeley, CA
4dHybrid

About The Position

Lawrence Berkeley National Lab's (LBNL) National Energy Research Scientific Computing Center (NERSC) Division has an opening for an HPC/AI Programming Environment Engineer to join the Programming Environments and Models (PEM) Group. The PEM group focuses on research, development, and engineering for scalable and efficient HPC software, working across programming models, toolchains, development environments, and runtimes. The team collaborates with vendors, open source projects, DOE community efforts, and standards bodies to enable current and future science on NERSC systems. You will contribute to advancing the AI and HPC software environment on NERSC's flagship systems, including Perlmutter and Doudna, supporting Berkeley Lab's role in the DOE Genesis Mission while enabling next-generation programming environments critical to NERSC's evolving infrastructure. The selected candidate will be hired at the Computer Systems Engineer 3 or 4 (CSE3 or CSE4) level depending on their skills and experience.

Requirements

  • Bachelor's degree in Computer Science, Computational Science, Physical Sciences, or related field with a minimum of 8 years of related experience; or Master's degree with 6 years of experience; or equivalent experience.
  • Experience with HPC software stacks and/or AI/ML frameworks such as PyTorch, TensorFlow, JAX, or similar technologies.
  • Experience with state-of-the-art languages, methods, and tools used to program, profile, and debug parallel scientific applications and workflows, such as MPI, OpenMP, CUDA, C++, Rust, Python, or Fortran.
  • Knowledge of the Linux environment.
  • Excellent written and oral communication skills.
  • Demonstrated ability to work effectively as part of a cross-disciplinary team.
  • Minimum of 12 years of related experience with a Bachelor's degree; or 8 years with a Master's degree; or equivalent experience.
  • Track record of technical leadership or leading collaborative projects.
  • Recognized expertise and established professional network in HPC or related fields.

Nice To Haves

  • Ph.D. in Computer Science, Computational Science, Physical Sciences, or related field.
  • Experience with production HPC environments and deploying services at scale.
  • Experience with high-performance interconnects and distributed communication libraries for HPC and AI workloads, such as MPI, NCCL, libfabric, or UCX.
  • Experience with container technologies (e.g., Docker, Podman, Singularity/Apptainer) and their application in HPC environments.
  • Experience with hardware and software technologies in emerging areas such as cloud computing, AI accelerators, and their application to HPC.
  • Demonstrated track record of contributions to relevant open source projects, software standards, or community initiatives.
  • Nationally or internationally recognized expertise in an HPC-related discipline.

Responsibilities

  • Develop, integrate, and support software frameworks and tools that enable HPC/AI workloads within the NERSC HPC software environment on Perlmutter, Doudna, and future systems.
  • Enable and optimize software environment technologies, including runtime integration, testing, and development of advanced capabilities for Doudna (NERSC-10), a state-of-the-art NVIDIA Vera Rubin system integrated by Dell.
  • Serve as a liaison with NESAP science teams to understand workflow requirements and ensure programming environments meet the needs of scientific workloads.
  • Collaborate with vendors to prioritize, develop, and enhance their technologies to meet the needs of DOE Office of Science application codes and workflows.
  • Evaluate emerging technologies for their applicability to NERSC's scientific workloads.
  • Measure and analyze performance and scalability of software frameworks and runtimes on current and future platforms.
  • Contribute software engineering expertise to cross-team NERSC activities and collaborate across Berkeley Lab and the DOE Office of Science community.
  • Prepare technical documentation, reports, papers, presentations and training materials describing significant results for dissemination within NERSC and the broader research community.
  • Work directly with scientists and developers to ensure correct and optimal usage of software technologies and ensure requirements are met by future development.
  • Provide technical leadership and mentorship within the PEM group and across NERSC.
  • Lead development and deployment efforts for major programming environment initiatives.
  • Represent NERSC in vendor engagements, standards bodies, and the broader HPC community.
  • Work with greater independence and drive strategy for areas of responsibility.

Benefits

  • Exceptional health and retirement benefits, including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you’ll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service