High Performance Compute - Performance Engineer

GE Vernova•Greenville, NC

1d•Onsite

About The Position

We are seeking an HPC Performance & Reliability Engineer to join our software engineering team supporting the development of gas turbine design tools. This is a critical individual contributor role responsible for profiling, benchmarking, and optimizing the performance of a diverse portfolio of engineering applications running on a hybrid HPC environment, with a majority of compute in AWS. The successful candidate will establish best practices for job configuration, lead scaling studies, coordinate SLURM job launch configurations, and proactively monitor HPC resource usage to ensure a reliable and efficient compute environment for our internal engineering users. This role works in close partnership with the IT team and serves as the technical focal point for HPC performance, coordinating efforts across bubble assignment contributors and maintaining the documentation and standards that guide our user community.

Requirements

Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with minimum 8 years of experience
This role requires use of technical data subject to U.S. Government export restrictions and this posting is only for U.S. Persons (U.S. Citizens, lawful permanent residents and protected individuals (e.g., certain refugees and asylees)). GE will require proof of status prior to employment

Nice To Haves

Understanding of HPC architectures, job scheduling concepts, and parallel computing paradigms
Hands-on experience with SLURM — configuration, job script development, scheduler tuning, and troubleshooting
Ability to interpret profiling data and translate findings into concrete configuration and code-level recommendations
Experience benchmarking and conducting scaling studies (strong/weak scaling analysis)
Proficiency in Python and shell scripting for scripting, automation, and data analysis of job logs and performance metrics
Working knowledge of Fortran and/or C++ sufficient to understand application structure and interpret profiling output — development expertise not required
Familiarity with ANSYS and/or Siemens simulation products (FEA/CFD solvers such as ANSYS Mechanical, Fluent, or Siemens STAR-CCM+) and their HPC deployment and licensing models is strongly preferred
Strong written and verbal communication skills with the ability to convey technical findings to both engineering users and IT stakeholders
This role requires access to U.S. export-controlled information. If applicable, final offers will be contingent on ability to obtain authorization for access to U.S. export-controlled information from the U.S. Government.

Responsibilities

Evaluate, select, and deploy profiling tools appropriate for a mixed application environment including Fortran/C++/Python applications using OpenMP and MPI parallelism, as well as third-party commercial solvers (ANSYS and Siemens FEA/CFD products)
Conduct systematic profiling of engineering applications to identify performance bottlenecks, inefficient resource utilization, and optimization opportunities
Perform scaling studies across varying job sizes and processor counts to characterize how different application types and problem sizes perform as compute resources scale
Develop and maintain a library of performance benchmarks and profiling results for key applications in the portfolio
Translate profiling findings into actionable recommendations for job configuration and resource allocation
Own and maintain SLURM job launch scripts and configurations, working directly with the IT team to implement and validate changes
Determine and document optimal job settings including CPU/memory allocation, AWS instance type selection, MPI/OpenMP thread configurations, and SLURM scheduler parameters for different application types and job sizes
Coordinate with IT to ensure SLURM configurations reflect current HPC infrastructure capabilities and AWS environment changes
Serve as the technical focal point for bubble assignment contributors working through the profiling backlog, establishing standards and reviewing their outputs
Proactively engage with new and existing user groups to understand their workflows, application usage patterns, and compute requirements
Analyze job logs and scheduler data to identify new workload types, unusual resource consumption patterns, or uncharacterized applications that require profiling
Work with users and team leads to prioritize profiling and optimization efforts based on business impact and resource consumption
Maintain a current understanding of the full portfolio of job types submitted to the HPC environment
Communicate profiling results and scaling study findings directly to engineering users in a clear and actionable format
Develop and maintain documentation of recommended job configurations, core count selection guidance, and best practices tailored to specific application types and job sizes
Educate users on how to select appropriate compute resources based on job type, problem size, and performance tradeoffs
Establish and maintain best practice standards for job submission across the user community
Define the data requirements and key metrics needed to support HPC monitoring dashboards, partnering with the dashboard development resource to ensure operational visibility
Actively monitor HPC usage and resource metrics to detect anomalies including abnormal resource consumption by new or existing users, elevated job failure rates, increased queue times, unusually low utilization, and node availability issues
Investigate anomalies proactively, resolving or escalating issues before they impact users
Maintain proactive communication with users and stakeholders when issues are identified or changes are planned

Benefits

medical, dental, vision, and prescription drug coverage
access to Health Coach from GE Vernova, a 24/7 nurse-based resource
access to the Employee Assistance Program, providing 24/7 confidential assessment, counseling and referral services
GE Vernova Retirement Savings Plan, a tax-advantaged 401(k) savings opportunity with company matching contributions and company retirement contributions, as well as access to Fidelity resources and financial planning consultants
tuition assistance
adoption assistance
paid parental leave
disability benefits
life insurance
12 paid holidays
permissive time off