This role involves training large language models (LLMs) to generate production-grade code across various programming languages. The core tasks include comparing and ranking multiple code snippets, providing explanations for the best choices, and repairing and refactoring AI-generated code to ensure correctness, efficiency, and adherence to style guidelines. You will also be responsible for injecting feedback, such as ratings, edits, and test results, into the Reinforcement Learning from Human Feedback (RLHF) pipeline to ensure its smooth operation. The ultimate goal is to teach the model to propose, critique, and improve code in a manner similar to an expert engineer. The RLHF process involves generating code, having expert engineers rank, edit, and justify their choices, converting this feedback into reward signals, and then using reinforcement learning to fine-tune the model towards producing ship-ready code.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Part-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
11-50 employees