We’re seeking an Infrastructure Engineer to develop and manage scalable infrastructure to support our research workloads. You will own our existing Kubernetes cluster, deployed on top of bare-metal H100 cloud instances. You will oversee and enhance the cluster to 1) support new workloads, such as multi-node LoRA training; 2) new users, as we double the size of our research team in the next twelve to eighteen months; and 3) new features, such as fine-grained experiment compute usage tracking. You will be the point-person for cluster-related work. You will work on the Foundations team alongside experienced engineers, including those who built and designed the cluster, who can provide guidance and backup. However, as our first dedicated infrastructure hire, you will need to work autonomously, design solutions to varied and complex problems, and communicate with researchers who are technically skilled but less knowledgeable about our cluster and infrastructure. This is an opportunity to build the technical foundations of the largest independent AI safety research institute, with one of the most varied research agendas. You will be working directly with both the Foundations team and researchers across the organization to enable bleeding-edge research workloads across our research portfolio.