Design, build, and iterate on MuJoCo simulation environments for robotics research and AI training Implement and tune RL algorithms (PPO, SAC, TD3) to train agents on simulated tasks Define reward functions, observation spaces, and action spaces that produce robust, transferable policies Debug and optimize physics simulations — contact models, actuator dynamics, scene configs Evaluate trained policies for stability, generalization, and sim-to-real transfer potential Document environment specs, training procedures, and experimental results clearly Collaborate async with research teams and stay current with advances in robot learning and embodied AI RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Mid Level
Number of Employees
11-50 employees