This project focuses on implementing and scaling a multi-agent system (MAS) training and evaluation pipeline on Aurora, as part of the AuroraGPT initiative at Argonne. I am a postdoctoral researcher partially funded by ALCF, working with Dr. Venkatram Vishwanath and Dr. Rajeev Thakur on large-scale LLM systems and evaluation. The goal is to translate a recent multi-agent research framework into a practical, runnable system on Aurora that can support large-scale experimentation and future scientific workflows. The MAS framework studies how multiple LLM agents collaborate to solve complex tasks, and how system-level evaluation signals can be transformed into agent-level and message-level training signals for improving cooperation, reliability, and efficiency. The summer student will assist with implementing the multi-agent orchestration, logging and trace collection, evaluation hooks, and scalable execution on Aurora, enabling controlled experiments and benchmarking at leadership scale. This work aims to demonstrate one of the first end-to-end multi-agent LLM systems running natively on Aurora and contribute toward publishable results in multi-agent learning for science.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
No Education Listed