We are seeking a highly skilled Principal Engineer – HPC Operations to oversee the daily operations and support of high-performance computing clusters designed to power large-scale AI and ML workloads. This role ensures stable, secure, and high-performing infrastructure leveraging technologies such as Slurm, Kubernetes, and modern MLOps platforms. The ideal candidate will bring deep technical expertise in HPC and a strong operational mindset to drive continuous improvement and automation across globally distributed environments. Responsibilities will extend to collaborating with multidisciplinary teams, leading complex projects, implementing cutting-edge technologies, and providing mentorship to operations engineers.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal