Reinforcement Learning Engineer - Ingénieur(e) en apprentissage par renforcement

NBCUniversal•Montreal, QC

About The Position

We are seeking a Reinforcement Learning Engineer with experience manipulating virtual environments to train autonomous agents. This role focuses on the design of robust simulation environments, reward structures, and policy architectures that can navigate complex, multi-sensor landscapes. You will play a key role in bridging simulation and real-world performance by developing scalable RL systems and ensuring reliable agent behavior in varied conditions.

Requirements

Graduate degree (Master’s or PhD) in Robotics, Computer Science, AI, or a related field with a focus on Reinforcement Learning, Imitation Learning, or other Online Machine Learning fields.
Proven experience as an RL Engineer or Research Engineer in a fast-paced environment.
Prior experience in industries with complex multi-disciplinary teams such as robotics, smart grids, precision agriculture, game development, or aerospace.
Fluency with Python, Git, and the Unix shell.
Deep familiarity with frameworks like Ray Rllib, Stable Baselines3, or CleanRL.
Experience with physics engines (MuJoCo, Bullet) or 3D game engines.
Familiarity with collaborative tools such as Jira/Confluence, Slack, a Git server, and an experiment tracking framework.
Strong Mathematical Background: Essential for understanding Markov Decision Processes (MDPs) and gradient-based optimization.
High Attention to Detail: Critical for debugging non-deterministic agent behaviors and ensuring environment parity.

Responsibilities

Work with partner ML and Annotation engineers and TPMs to spec out data, simulation, and training requirements.
Build and maintain high-fidelity 2D/3D simulation environments (using tools like Unity, Unreal, or Isaac Sim) that serve as the training ground for RL agents.
Design and tune complex reward functions that align agent behavior with product goals and safety constraints.
Develop and optimize RL algorithms (e.g., PPO, SAC, or Offline RL) capable of handling high-dimensional 3D observation spaces.
Analyze the "reality gap" and implement domain randomization or adaptation techniques to ensure models perform reliably in real-world scenarios.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume