Centific AI Research seeks a PhD Research Intern to design and evaluate reinforcement learning (RL) systems for agentic AI workflows. You will develop RL environments, reward models, and post-training pipelines for LLM-based agents, translating research into practical enterprise solutions. The scope of work includes end-to-end RL pipelines for agentic systems (simulation → training → evaluation), alignment of LLM-based agents using RLHF, DPO, PPO, and emerging methods, design of reward functions, verifiers, and evaluation frameworks, simulation environments (digital twins) for enterprise workflows, and scalable training and inference for RL-based systems. Example projects include building a custom RL environment simulating a real-world enterprise workflow and training an agent using PPO or GRPO, developing a reward modeling pipeline from human feedback and evaluating alignment improvements, creating an evaluation harness measuring reasoning, task success, and policy safety, and prototyping an agentic system with tool use and multi-step reasoning, integrated with RL training, and documenting experiments, ablations, and findings for research and productionization.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
Ph.D. or professional degree