AI Safety Lab Lead (RCO)

National Research Council Canada•Ottawa, ON

10d•CA$119,688 - CA$168,192•Hybrid

About The Position

We are seeking a mid-career RCO-04 to establish and lead the NRC’s AI Safety Lab, a new capability funded by the Canadian AI Safety Institute (CAISI) and dedicated to evaluating the safety of advanced AI systems. You will be part of a pioneering team conducting rigorous AI safety evaluations and translating technical results into actionable tools, guidance, and recommendations for AI practitioners and policymakers across government and beyond. Under the guidance of the Principal Advisor for AI Safety, and as part of the Chief Digital Research Officer’s team, you will play a central role in building and operating a scalable AI safety evaluation infrastructure. In collaboration with leading experts from our Digital Technologies Research Centre, you will design, execute, and monitor advanced evaluations at scale on frontier AI models, and contribute to cutting-edge AI safety research carried out by the NRC in collaboration with CAISI partners nationally and internationally.

Requirements

Master’s degree from a recognized university in Statistics, Data Science, Computer Science or Engineering, or closely related to the position. Equivalency: In lieu of a master degree, a combination of a Bachelor’s degree with at least 2 years of directly relevant professional experience in applied AI or AI evaluation may be considered.
Recent experience leading or managing technical teams in an AI/ML, data analytics, or digital technologies environment.
Recent experience evaluating, benchmarking, or red-teaming AI models (open-source or closed-source) or AI systems.
Experience in managing AI/ML or data analytics projects for government or industry clients.
Experience developing Python-based tooling and automation for ML/AI workflows.
Experience in documenting experimental AI results through technical reports, internal documentation, or scientific publications.
Experience with red-teaming workflows, adversarial testing, or stress-testing of AI systems (including agentic systems).
Experience with compute clusters and job orchestration (e.g., SLURM).
Experience with ML experiment tracking tools (e.g., MLflow, Weights & Biases).
Experience developing or curating AI benchmark datasets and associated documentation (e.g., through HuggingFace).
Experience translating AI research into practical tools, guidance, or policy recommendations.
Experience conducting AI projects with researchers or through international collaboration.
Ability to lead the development of reproducible evaluation protocols and benchmarks for frontier AI models and systems.
Ability to communicate complex technical findings clearly to researchers, government stakeholders, and policy audiences.
Knowledge of AI safety concepts, risk taxonomies, or governance frameworks (e.g., TBS guidelines, NIST AI Risk Management Framework) and their connection to technical evaluation.
Ability to design and oversee scalable AI evaluation infrastructure across on-premises and cloud environments.
Knowledge of AI safety evaluation frameworks (e.g., Inspect AI, Moonshot) and their application to frontier model assessment.
Ability to make sound technical decisions when configuring evaluations across different model types, balancing compute budgets, reproducibility, and result validity.
Ability to manage and prioritize competing technical workstreams across a multidisciplinary team.
Knowledge of agentic AI architectures and how design choices such as tool access, memory, and planning affect safety-relevant behaviours.
Ability to design experiment tracking and data management workflows that support reproducibility across multiple researchers and evaluation campaigns.
Research - Communication (Level 3)
Research - Results orientation (Level 3)
Supervisor - Client focus (Level 2)
Supervisor - Teamwork (Level 3)
Supervisor - Organizational/environmental awareness (Level 2)
Management services - Conceptual and analytical ability (Level 3)

Nice To Haves

A track record of building and managing AI teams.
Hands-on experience evaluating frontier AI models using safety evaluation frameworks (e.g., Inspect, Moonshot).
A strong background in compute cluster operations and scalable ML infrastructure.
Experience designing reproducible evaluation protocols and benchmarks.
Experience translating research into practical tools, guidance, or policy recommendations.
The ability to collaborate across research, government, and international partners.
Strong analytical judgment and initiative.
Clear communication skills across technical and non-technical audiences.

Responsibilities

Establish and manage a multidisciplinary team of AI safety specialists forming the lab.
Build and leverage state-of-the-art model evaluation infrastructure and frameworks.
Work with AI safety researchers from the NRC and CAISI partners in designing, running and documenting AI safety evaluations.
Establish and steward standardized, reproducible safety evaluation protocols.
Develop and curate new AI safety benchmarks.
Translate technical findings into practical solutions and advice, such as new tools for evaluators/developers, or recommendations to policy makers.

Benefits

Robust pension plan
Comprehensive health and dental coverage
Disability and life insurance
Office closure at the end of December
Additional supports to enhance your well-being throughout your career and beyond.
Bilingualism Bonus

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume