St. Jude Children’s Research Hospital is a world-class pediatric research hospital dedicated to advancing cures and means of prevention for pediatric catastrophic diseases through research and treatment. It is recognized as a 'Best Place to Work' due to its collaborative and positive environment where professionals are supported to advance their careers and contribute to finding cures. The hospital leads the world in understanding, treating, and defeating childhood cancer and other life-threatening diseases. St. Jude is seeking an HPC Infrastructure DevOps Engineer II to join the High-Performance Computing Support (HPCS) team. This role is crucial for the smooth operation, automation, and continuous improvement of St. Jude’s high-performance computing environment, with a strong emphasis on HPC operations, DevOps practices, and automation for configuration, testing, monitoring, and autonomous remediation. The position involves supporting a modern research computing ecosystem that spans both on-premises and remote-site infrastructure. This ecosystem includes HPC compute platforms for research and data-intensive workloads, GPU-enabled environments for AI and machine learning applications, high-capacity research, compliant, and scratch storage tiers, as well as archival, backup, and disaster recovery services. The role also involves operational tooling for observability, governance, and process automation. The HPC Infrastructure DevOps Engineer II will collaborate closely with infrastructure, storage, security, and research teams to deliver reliable and scalable services for computational science, regulated workflows, and AI-enabled research. This position is central to the HPCS service portfolio, encompassing daily HPC client request fulfillment, performance and utilization monitoring, data management and governance, data cataloguing and archival services, and HPC process automation DevOps.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
101-250 employees