Senior ML/RL Engineer, Behavior Planning

Bot Auto•Houston, TX

About The Position

At Bot Auto, we are revolutionizing the transportation of goods with our cutting-edge autonomous trucks, enhancing the quality of life for communities around the globe. With the agility of a startup and the wisdom of seasoned experts, our team has achieved numerous world-firsts and unparalleled innovations. United by a shared vision, we create groundbreaking solutions that propel the future of transportation. Join us and transform your ideas into reality. We are seeking a Senior ML/RL Engineer to join our Algo team and drive the development of our unified behavioral architecture. In this role, you will help bridge the gap between simulation and the real world by developing a scalable policy framework that represents both our L4 ego-policy and a diverse population of simulated agents. You will work at the intersection of Multi-Agent Reinforcement Learning (MARL) and safety-critical system design to ensure our autonomous semi-trucks navigate highways with superhuman safety and precision.

Requirements

Proven track record of training and deploying deep RL algorithms (e.g., PPO, SAC) for complex, real-world robotic or autonomous systems.
Expertise in Python and PyTorch; strong understanding of modern deep learning architectures and optimization techniques.
MS or PhD in Computer Science, Robotics, or a related quantitative field.
Ability to diagnose and solve fundamental challenges in RL training, such as variance management and distribution shift.

Nice To Haves

Experience with constrained optimization or safety-critical learning frameworks.
Background in MARL training stability, including self-play and decentralized execution strategies.
Familiarity with vehicle dynamics and behavior planning, particularly for long-haul highway environments.

Responsibilities

Develop and train diverse, conditioned policies that simulate realistic driving behaviors to stress-test and validate our autonomous driving stack.
Lead the research and implementation of advanced RL algorithms to ensure safety metrics are treated as primary constraints in the learning process.
Collaborate with cross-functional teams to design robust reward functions and evaluation metrics that balance safety, progress, and comfort.
Contribute to the optimization of our large-scale, high-throughput training environments to enable rapid iteration on complex multi-agent scenarios.
Advance our state-of-the-art neural architectures to improve spatial reasoning, long-horizon planning, and interaction modeling.
Work closely with Simulation and Planning teams to integrate research-grade models into production-quality, safety-critical software.