Systems Analyst (/Site Reliability Engineer)

Hewlett Packard Enterprise

133d•$115,500 - $266,000

About The Position

We are seeking a skilled Systems Analyst (/Site Reliability Engineer) at HPE to support Oak Ridge National Laboratory (ORNL). This is a unique, on site, customer facing opportunity to work with some of the world's most advanced high-performance computing (HPC) systems, including Frontier, the world’s first exascale supercomputer. As part of our team, you will play a critical role in the deployment, maintenance, and optimization of large-scale computing software infrastructure and hardware, ensuring system reliability for cutting-edge scientific research.

Requirements

Due to the nature of the work, this position requires either U.S. Citizenship or U.S. Lawful Permanent Resident (LPR) status.
Bachelor’s in Computer Science, Computer Engineering, or a related field, with at least 2 years of experience, OR a Master’s in Computer Science or Computer Engineering of a related field.
HPC System Experience: Experience using SLURM-based HPC systems, both as a user and preferably as a system administrator.
Technical Skills: Proficient in Linux, Python, and Bash scripting. Familiarity with C++/Fortran-based HPC application development, GPUs, MPI, and high-performance computing tools.
Application Build and Configuration Knowledge: Strong understanding of application build processes, including compiler configurations, library integration, and dependency management, to effectively set up environments, perform upgrades, and troubleshoot build and runtime issues.
Log analysis: Experience in large-scale log analysis and troubleshooting performance, bugs or system failures.
Communication Skills: Strong written and verbal communication skills, with the ability to document and share knowledge effectively with internal teams and end-users.
Industry Knowledge: Familiarity with emerging HPC trends, system architectures, and optimization strategies.

Nice To Haves

Accountability
Active Learning
Active Listening
Bias
Business Growth
Client Expectations Management
Coaching
Creativity
Critical Thinking
Cross-Functional Teamwork
Customer Centric Solutions
Customer Relationship Management (CRM)
Design Thinking
Empathy
Follow-Through
Growth Mindset
Information Technology (IT) Infrastructure
Infrastructure as a Service (IaaS)
Intellectual Curiosity
Long Term Planning
Managing Ambiguity
Process Improvements
Product Services
Relationship Building

Responsibilities

Maintain and optimize compute infrastructure across multiple large-scale HPC systems.
Participate in the deployment, testing, and validation of live high-performance computing clusters.
Troubleshoot node failures by analyzing OS internals, compiler behavior, and system logs, coordinating with internal subject-matter experts as needed.
Conduct routine and on-demand maintenance, troubleshooting, and performance tuning for large-scale HPC environments.
Collaborate with researchers, engineers, and technical staff to open, maintain and close JIRA tickets to ensure system reliability and efficiency for high-stakes, high-performance scientific research.
Investigate and document complex software and system-level issues, acting as a bridge between users and HPE internal teams.
Develop and implement automation tools, scripts, and monitoring solutions to streamline system management.
Stay up-to-date with advancements in HPC technologies, including GPU acceleration (e.g., ROCm), parallel computation (Cray PE, MPI/OpenMP), and performance tuning.

Benefits

Health & Wellbeing: Comprehensive suite of benefits that supports physical, financial and emotional wellbeing.
Personal & Professional Development: Programs catered to helping you reach career goals.
Unconditional Inclusion: A culture that celebrates individual uniqueness and values varied backgrounds.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Education Level

Bachelor's degree

Systems Analyst (/Site Reliability Engineer)

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company