Senior HPC Linux Systems Engineer

Oak Ridge National LaboratoryOak Ridge, TN
48d

About The Position

Oak Ridge National Laboratory (ORNL) is seeking a Senior HPC Linux Systems Engineer to serve as a technical leader supporting some of the most advanced computing environments in the world. This evergreen posting represents multiple potential openings for senior-level roles across ORNL's high-performance computing ecosystem. Senior HPC Linux Systems Engineers are recognized experts who lead the design, implementation, and optimization of complex HPC infrastructure. They manage large-scale technical projects, guide technical direction for their teams, and serve as trusted advisors to scientific and operational leadership across ORNL.

Requirements

  • Bachelor's degree in computer science, engineering, or a related technical field.
  • A minimum of 8 years of relevant experience in Linux systems administration or HPC systems engineering.

Nice To Haves

  • Demonstrated experience leading the design and deployment of HPC or large-scale distributed computing systems.
  • Expertise with batch schedulers (SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale).
  • Proven ability to lead technical projects from concept through implementation, balancing technical depth with project delivery.
  • Strong proficiency in automation and infrastructure-as-code frameworks (Ansible, Puppet, Salt).
  • Advanced scripting or programming skills (Python, Bash, Go) for automation and operational tooling.
  • In-depth understanding of high-speed interconnects (InfiniBand, Slingshot, Ethernet) and storage architectures.
  • Experience managing identity and access management systems, including MFA, SSO, and zero-trust frameworks (PingFederate, RSA SecureID, Entra ID).
  • Experience integrating virtualization or containerization solutions (VMware, KVM, Apptainer, Podman) into HPC environments.
  • Ability to manage client and stakeholder relationships across multiple directorates and technical disciplines.
  • Excellent written and verbal communication skills, including the ability to present complex technical concepts to diverse audiences.
  • Proven ability to influence technical strategy and mentor staff in a collaborative research environment.

Responsibilities

  • Provide technical leadership in the design, integration, and administration of large-scale Linux-based HPC clusters, high-speed networks, and storage systems.
  • Lead medium to large technical projects, coordinating requirements, schedules, and deliverables across internal and external stakeholders.
  • Architect and deploy advanced infrastructure solutions supporting exascale-class and mission-critical computing environments.
  • Serve as a technical mentor for HPC engineers, guiding best practices in automation, performance tuning, and system security.
  • Develop, implement, and maintain configuration management and automation frameworks (e.g., Ansible, Puppet, Salt) to enhance reliability and reproducibility.
  • Perform advanced system performance analysis, troubleshooting, and optimization, ensuring system scalability and long-term sustainability.
  • Manage critical vendor and partner relationships, representing ORNL's technical requirements during procurement, integration, and system acceptance.
  • Contribute to strategic planning and technology roadmaps, influencing unit goals and technical direction.
  • Collaborate closely with scientists, researchers, and IT specialists to align infrastructure capabilities with research and security objectives.
  • Ensure compliance with DOE cybersecurity standards, configuration baselines, and operational policies.
  • Author technical documentation, present internal briefings, and communicate complex issues and resolutions to management and stakeholders.
  • Participate in on-call rotations, maintenance windows, and incident response as needed to support 24x7 operations.

Benefits

  • Work on the world's most powerful supercomputers, including Frontier, the first system to achieve exascale performance.
  • Enable breakthrough science in fields like fusion energy, climate modeling, AI, and national security.
  • Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
  • Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
  • Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service