Systems Engineer, HPC (US & Canada)

Mistral AIToronto, NY
Hybrid

About The Position

About Mistral At Mistral AI, we build high-performance, open, and efficient AI systems designed to power the next generation of applications. Our infrastructure combines large-scale distributed systems, cloud platforms, and HPC environments to support cutting-edge research and production workloads. We are a collaborative, low-ego, and highly technical team, operating across Europe, the US, and beyond. As we scale rapidly, we are building the foundational infrastructure to support thousands of nodes and petabyte-scale systems. Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers . About the Role We are looking for Systems Engineers / System Administrators to help design, operate, and scale the infrastructure behind Mistral’s AI platforms. This is a hands-on, hybrid role combining: Systems administration (operating and troubleshooting large-scale Linux environments) Systems engineering (automation, scalability, and performance improvements) You’ll work closely with infrastructure, HPC, and research teams to ensure our clusters and platforms run reliably at scale.

Requirements

  • Strong Linux systems administration experience (core requirement)
  • Experience working in large-scale environments: HPC clusters or cloud infrastructure
  • Experience with Job schedulers (e.g. Slurm)
  • Solid troubleshooting skills across systems, hardware, and networks

Nice To Haves

  • Containers / orchestration (e.g. Kubernetes)
  • Storage systems (e.g. Ceph, Lustre, NFS)
  • Networking fundamentals (Ethernet; InfiniBand is a plus)
  • Infrastructure as Code / automation tooling
  • GPU or AI/ML experience

Responsibilities

  • Operate and maintain large-scale Linux environments (bare metal, clusters, cloud)
  • Monitor system health, troubleshoot incidents, and ensure high availability
  • Support production and research workloads across multiple environments
  • Help scale clusters toward hundreds to thousands of nodes
  • Work on systems handling petabyte-scale storage
  • Improve performance, reliability, and resource utilisation
  • Automate operational tasks using tools like Python, Bash, Ansible, or Terraform
  • Improve deployment, provisioning, and system lifecycle management
  • Contribute to system design and architecture decisions
  • Work closely with HPC / infrastructure teams, Platform / DevOps engineers, and Research teams
  • Act as a bridge between users and infrastructure

Benefits

  • Competitive compensation
  • Benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service