RL has become the de-facto method of post-training LLMs. We are limited by the sample efficiency of the current policy gradient algorithms in use today, and are looking for a talented researcher to weave together pre-LLM and post-LLM approaches to learning from experience.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Ph.D. or professional degree