HPC System Administrator

Advanced Micro Devices, IncAustin, TX
3h

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: AMD’s Software and Solutions Team is seeking a High Performance Computing (HPC) System Administrator for AMD CPU and GPU platforms. This role requires strong foundational expertise in HPC system administration for internal-to-AMD HPC GPU systems. The systems are used for application performance, GPU programming, and scientific simulation domains. As part of AMD’s HPC and AI Enablement Team, the System Administrator will collaborate directly with team members across the organization keep systems available to the users so they can modernize, port, and tune applications for supercomputing and AI-enabled workflows. The role spans daily system administration tasks, including granting, deleting access, investigate and correct defects in the system as reported, all while adhering to the service levels required. THE PERSON: A highly motivated and passionate HPC system administrator or integration engineer with deep expertise in HPC systems. This individual thrives in fast-paced, highly technical, cross-functional environments—collaborating with researchers, scientific software teams, and HPC groups that test, port, optimize, and scale complex simulation and modeling codes across domains such as finite element analysis, computational chemistry, weather modeling, fluid dynamics, and energy systems. They will apply advanced systems engineering techniques across CPU and GPU systems. The ideal candidate is proficient as a Linux system administrator with developing automation tools for system monitoring and deployment automation using Bash or Python. Experienced with HPC build systems and revision control practices, and comfortable supporting geographically distributed teams. They must be self-motivated, collaborative, and able to work effectively in a team environment.

Requirements

  • Proven experience in integration projects for HPC and Machine Learning environments
  • Bash or Python programming skills
  • Expertise with autotools, make, cmake, containers (Docker, Singularity)
  • Strong team software development skills including demonstrated expertise with git, Jenkins, Jira, and similar tools.
  • In-depth knowledge of software development practices including debug, test, revision control, documentation, and bug tracking
  • Experience with virtualization and cloud computing
  • Linux administration, scripting expertise, cluster tools
  • Outstanding interpersonal and communication skills

Responsibilities

  • Administer internal HPC GPU machines
  • Responsible for exploring opportunities to improve product
  • Work closely with other team members to understand design architecture and to propose solutions to improve and enhance products

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service