CoreWeave runs some of the largest GPU clusters in the world. The AI infrastructure behind those clusters determines how workloads are placed, how resources are shared, and how reliably systems perform under constant pressure. As a Principal Engineer in AI Infrastructure, you will lead the design and evolution of the cluster orchestration systems that make this possible. This includes Slurm, Kubernetes, SUNK, and the control planes that support AI training, inference, and model onboarding at scale. You will define long-term architecture, solve hard scaling problems, and set technical direction across teams. Your work will directly affect how quickly customers can run models, how efficiently we use GPUs, and how reliably the platform behaves at scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal