Senior Manager, High Performance Compute

IonQBoulder, WA
20h$200,698 - $26,276Hybrid

About The Position

IonQ is developing the world's most powerful full-stack quantum computer based on trapped-ion technology. We are pushing past the limits of classical physics and current supercomputing technology to unlock a new era of computing. Quantum computing has the potential to impact every area of human society for the better. IonQ’s computers will soon redefine industries like medicine, materials science, finance, artificial intelligence, machine learning, cryptography, and more. IonQ is at the forefront of this technological revolution. At IonQ, we are building the world’s most powerful quantum computers. But before a quantum circuit runs on trapped ions, it often lives as a massive simulation on classical hardware. We are seeking a Senior Manager, High Performance Compute to lead the team responsible for the hybrid HPC computational platform that powers our physics simulations, hardware verification, and quantum algorithm development. This is a Player-Coach role for a technical leader who refuses to lose their edge. You will manage a team of talented engineers while remaining hands-on with the technology. One day you might be hiring a HPC engineer and the next you might be debugging a race condition in a Slurm scheduler or profiling a simulation kernel for GPU efficiency. You will sit at the intersection of classical supercomputing and quantum simulations, building the hybrid infrastructure that allows us to push past the limits of classical physics.

Requirements

  • Bachelor’s degree in Computer Science, Physics, Engineering, or equivalent practical experience
  • 7+ years of HPC experience, with deep expertise in Linux systems administration and cluster management
  • 3+ years experience leading engineering teams, managing backlogs, and conducting performance reviews, with a desire to remain hands-on
  • 3+ years experience with Slurm. You know how to configure fair-share scheduling, backfill, and preemption
  • Proven experience deploying HPC clusters in the public cloud (AWS, Azure, or GCP) using tools like AWS ParallelCluster, Batch, or equivalent
  • Strong proficiency in Python and Bash. You treat infrastructure as code (i.e. Ansible, Terraform, Packer)

Nice To Haves

  • 10+ years of HPC experience
  • 5+ years of experience in engineering management
  • Experience running and optimizing large-scale scientific simulations (e.g., molecular dynamics, CFD, or electronic design automation)
  • Understanding of MPI (Message Passing Interface) and GPU acceleration (CUDA/ROCm) frameworks
  • A background in Physics or experience with quantum simulation software (e.g., Qiskit, Cirq, or proprietary solvers)
  • Experience with high-performance parallel file systems (Lustre, GPFS/Spectrum Scale, or WEKA)

Responsibilities

  • Lead, mentor, and grow a team of HPC engineers.
  • Foster a culture of technical rigor where "it works" isn't enough; it has to be performant.
  • Own the strategy for our hybrid HPC environment. Balance workloads between our on-premise clusters and burst-capacity in the cloud to maximize simulation throughput per dollar and create a fantastic user experience.
  • Partner directly with quantum physicists and applications teams to understand their simulation needs. Translate complex scientific requirements into concrete infrastructure roadmaps. Deliver on those roadmaps.
  • Manage relationships with hardware vendors and cloud providers, negotiating for the specialized compute instances (e.g., H200s, high-memory nodes) required for our workloads.
  • Architect and tune our job schedulers (leveraging Slurm) to handle massive, spiky workloads involving many concurrent simulation jobs.
  • Dive deep into the stack to optimize I/O patterns, memory usage, and parallelization strategies.
  • Build and maintain the "glue" that allows users to submit jobs seamlessly to on-prem hardware or cloud clusters.
  • Troubleshoot complex failures in simulation pipelines, distinguishing between infrastructure issues and algorithmic bugs, providing a best-in-call user experience for submitting, running, and troubleshooting jobs.

Benefits

  • comprehensive medical, dental, and vision plans
  • matching 401K
  • unlimited PTO and paid holidays
  • parental/adoption leave
  • legal insurance
  • a home internet stipend
  • pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service