AI Evaluation Specialist (CS-2)

National Research Council Canada•Ottawa, ON

8d•CA$86,503 - CA$108,068•Hybrid

About The Position

We are looking for a CS-02 AI Evaluation Specialist to join our dynamic team at the National Research Council (NRC) as we embark on ground-breaking projects related to AI safety and security. Be part of a pioneering team that will establish a new AI Safety Lab for conducting system evaluations and translating technical findings into practical solutions and recommendations for AI practitioners and policy makers. Under the guidance of the AI Safety Lab Lead, you will play a critical role in building, maintaining and using a new infrastructure for AI safety evaluation. You will also set up, run, and monitor advanced agentic evaluation at scale on frontier AI models, while contributing to cutting-edge AI safety research conducted by the NRC in collaboration with national and international partners.

Requirements

University degree in computer science, computer engineering, or related field.
Experience developing Python-based tooling and automation for ML/AI workflows in a research or production environment.
Experience with the programmatic use and orchestration of large language models (both closed-source and open-source).
Experience with containerization technologies (such as Docker, Podman, or Apptainer).
Experience with high-performance computing (HPC) clusters and job orchestration (e.g., SLURM).
Experience documenting experimental AI results through technical reports, internal documentation, or scientific publications.
Ability to design and implement reproducible AI workflows using Python.
Ability to make sound technical decisions when using different models, accounting for factors such as API constraints, compute budgets, reproducibility, and result validity.
Ability to diagnose and resolve infrastructure issues in HPC or cloud-based evaluation environments, including resource contention, job failures, and performance bottlenecks.
Knowledge of AI safety evaluation frameworks (e.g., Inspect AI, Moonshot) and their application to frontier model assessment.
Knowledge of complex data storage and querying tools for handling large volumes of evaluation data.
Knowledge of software engineering best practices including version control, CI/CD pipelines, automated testing, and related modern software development tools (e.g., Git, GitLab/GitHub) and practices.

Nice To Haves

A specialization in data analytics, machine learning, or Artificial Intelligence (AI) may be considered an asset.
A Master’s degree in a field related to the position may be considered an asset.
A college diploma with significant experience directly related to the duties of this position may be considered as an educational equivalent.
Experience setting up, running, or contributing to evaluations or benchmarking of AI systems.
Experience with red-teaming workflows, adversarial testing, or stress-testing of AI systems (including agentic systems).
Experience with AI/ML experiment tracking tools (e.g., MLflow, Weights & Biases).
Experience developing or curating AI benchmark datasets and associated documentation (e.g., through HuggingFace).
Knowledge of ML experiment tracking tools (e.g., MLflow, Weights & Biases) and their integration into reproducible research workflows.
Knowledge of agentic AI frameworks and protocols.
Familiarity with AI safety concepts, risk taxonomies, or governance frameworks (e.g., TBS guidelines, NIST AI Risk Management Framework, model evaluation practices).

Responsibilities

Build, maintain, and use a state-of-the-art model evaluation infrastructure combining on-premises and cloud-based clusters.
Support AI safety researchers leveraging the evaluation infrastructure.
Set up, run and monitor large-scale evaluations involving multiple AI models, agents and benchmarks.
Steward and document protocols for model evaluation and experiment tracking.
Contribute to develop and maintain customized evaluation benchmarks.
Contribute to the analysis and reporting of model evaluation results.
Stay current with the latest advancements in model evaluation tools, approaches, and packages.

Benefits

Robust pension plan
Comprehensive health and dental coverage
Disability and life insurance
Office closure at the end of December
Additional supports to enhance your well-being throughout your career and beyond.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume