Research Platform Engineer (HPC)

General Fusion IncRichmond, BC
CA$126,000 - CA$154,000

About The Position

General Fusion research relies heavily on experimental data and computer simulation to design and operate its experimental devices. We’re seeking a versatile technical lead to support the infrastructure that empowers our scientists, including managing our High-Performance Computing (HPC) environment, and contributing to our research data infrastructure. This is a dual role: as the HPC Administrator, half of your time will be spent ensuring our computer cluster is stable, optimized, and serving the needs of the science teams. The system runs Rocky Linux and comprises 70 computer nodes and 1PB of storage. The other half of your time will be spent contributing to our on-prem data systems that transform and serve our experimental data, with a focus on moving toward modern data architecture patterns and technologies. This role will help shape the computational research infrastructure at a scientific R&D startup. You'll have opportunities to propose architectural changes, reduce complexity, and build out systems that directly accelerate scientific discovery. If you're energized by working at the intersection of infrastructure, data, and scientific computing, this role is for you.

Requirements

  • Degree in Computer Science, Computer Engineering, Engineering Physics or related field
  • 5+ years professional experience in an applied R&D environment, working in scientific computing and/or research data infrastructure.
  • 2+ years of experience managing HPC clusters, with a solid understanding of InfiniBand, MPI/parallel computing concepts, storage architectures, and workload scheduling (SLURM)
  • 2+ years of platform or data engineering, specifically building systems that serve technical users
  • Experience across the modern Linux systems lifecycle, including OS administration (e.g. Rocky, Ubuntu, RHEL), container orchestration (Apptainer/Singularity, Docker), and declarative infrastructure to ensure environment reproducibility
  • Proficiency in low-level resource management (CPU/memory/IO) and system-level performance tuning
  • Experience implementing alerting, logging, and monitoring tools to track system health and performance (Prometheus, Grafana, or similar)
  • Experience with data pipelines and workflow orchestration, such as task queues, message brokers (e.g. RabbitMQ, Redis), workflow engines (e.g. Airflow, Prefect, Celery), or DAG-based processing
  • Professional Python development experience, including git/GitHub and code review practices
  • Excellent verbal and written communication skills; experience writing technical documentation
  • Proactive and collaborative, you’re comfortable taking ownership, proposing solutions, and can act as a bridge between development, IT, and research teams

Nice To Haves

  • Experience in a multidisciplinary research or R&D environment, with a background in physics, math, or advanced analytics
  • Good understanding of standard protocols like NFS, SMB, LDAP, DHCP and NTP
  • Database experience, including NoSQL (MongoDB)
  • Experience with big data tools and frameworks, such as modern 'lakehouse' patterns (e.g. Spark, Iceberg, Polars), high-performance analytical formats (Parquet, HDF5), and distributed OLAP engines (e.g. ClickHouse, DuckDB)
  • Experience with data versioning systems (e.g. DVC, LakeFS) and reproducible research best practices

Responsibilities

  • Act as the primary source of HPC expertise within General Fusion
  • Cluster administration, including maintaining the OS and software environment, resource provisioning and allocation, managing the job scheduler (SLURM), user account management, and monitoring system health and performance
  • Provide training and support for HPC users
  • Collaborate with IT on networking and physical infrastructure; ensure alignment with IT policies, security standards, and corporate governance requirements, including applicable SOX controls
  • Design high-performance data architectures for storage, retrieval and analysis of complex research datasets; contribute to data versioning, result reuse, and metadata cataloging systems
  • Contribute to the modernization of data processing pipelines, with an eye toward simplification and maintainability
  • Proactive monitoring of system health and performance, across both compute nodes and data pipelines
  • Seek opportunities to consolidate tooling and reduce operational overhead
  • Act as a bridge between traditional HPC computing and modern data platform patterns, helping integrate simulation data with experimental data systems
  • Maintain and improve technical documentation
  • Contribute to strategic planning and decision-making to help drive the evolution of General Fusion’s data systems

Benefits

  • Flexible hours
  • Four weeks’ vacation
  • Comprehensive benefits package
  • RRSP Contribution – No Employee Match Needed!
  • Support for professional development
  • Great company culture – social events, food trucks, bike rides, Sun Run, etc.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service