This project focuses on improving large language model inference performance on the ALCF inference engine, currently built using vLLM, a high-throughput LLM serving framework, and Ray, a distributed computing platform for scalable workload orchestration. Emphasis will be placed on the Sophia environment to evaluate scalability, latency, throughput, and resource utilization. The work will explore system-level and framework-level optimizations to enhance efficiency for production AI workloads. Education and Experience Requirements The entirety of the appointment must be conducted within the United States. Applicants must be: o Currently enrolled in undergraduate or graduate studies at an accredited institution. o Graduated from an accredited institution within the past 3 months; or o Actively enrolled in a graduate program at an accredited institution. Must be 18 years or older at the time the appointment begins. Must possess a cumulative GPA of 3.0 on a 4.0 scale. If accepting an offer, candidates may be required to complete pre-employment drug testing based on appointment length. All students remain subject to applicable drug testing policies. Must complete a satisfactory background check.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
No Education Listed