St. Jude is seeking an HPC Infrastructure DevOps Engineer II to join the High-Performance Computing Support (HPCS) team. This role is responsible for the smooth operation, automation, and continuous improvement of St. Jude’s high-performance computing environment, with a focus on HPC operations, DevOps practices, and automation for configuration, testing, monitoring, and autonomous remediation. The position supports a modern research computing ecosystem spanning on-premises and remote-site infrastructure, including: HPC compute platforms for research and data-intensive workloads, GPU-enabled environments for AI and machine learning applications, High-capacity research, compliant, and scratch storage tiers, Archival, backup, and disaster recovery services, Operational tooling for observability, governance, and process automation. Working closely with infrastructure, storage, security, and research teams, the HPC Infrastructure DevOps Engineer II will deliver reliable and scalable services for computational science, regulated workflows, and AI-enabled research. This role is central to the HPCS service portfolio, including daily HPC client request fulfillment, performance and utilization monitoring, data management and governance, data cataloguing and archival services, and HPC process automation DevOps.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
101-250 employees