FirstPrinciples is a research organization building AI infrastructure for discovery in fundamental science, focusing on systems like Theo, the AI Physicist. They are a fast-growing, remote-first team working across Canada, the US, and the UK, united by a shared curiosity about the universe and a belief in building systems to explore it more effectively. The work involves tackling abstract problems at the intersection of creativity and rigorous thinking, requiring comfort with ambiguity and iteration. This role is crucial for building and operating the compute foundation for AI-driven scientific discovery, ensuring research and inference workloads are reliable, scalable, and fast. The role involves designing, deploying, and operating Kubernetes clusters, Linux systems, GPU infrastructure, cloud environments, HPC-style compute, deployment workflows, monitoring, and automation. The goal is to build infrastructure that supports experimentation and production-like inference across cloud, bare metal, and hybrid environments. The engineer will play a central role in shaping compute operations, including provisioning and managing clusters, improving reliability and observability, reducing operational toil, supporting researchers and engineers, and making strategic decisions about infrastructure choices (managed cloud services, self-managed Kubernetes, Slurm-style systems, or owned hardware). The ideal candidate is hands-on, systems-oriented, and comfortable in a fast-moving research environment, with strong Kubernetes and Linux fundamentals, good operational instincts, and experience with cloud and HPC/GPU infrastructure to build a robust bare metal and multi-cloud inference platform.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed