Summer Research Intern

Abaka AI•Palo Alto, CA

55d•$25 - $60•Onsite

About The Position

We’re looking for Summer Research Interns to help build high-quality datasets, benchmarks, and evaluation pipelines across LLMs, vision, video, 3D/4D, multimodal reasoning, agentic systems, and world models. In this role, you’ll work closely with our internal research team and external collaborators from the 2077AI Foundation, contributing to research artifacts that are actively used by leading AI labs and academic groups. This internship is ideal for students passionate about evaluation science, dataset construction, and applied AI research at scale.

Requirements

Strong background in computer science, artificial intelligence, robotics, data engineering, or related fields.
Hands-on experience with machine learning or multimodal systems, including LLMs, vision models, or video models.
Proficient in Python; experience with PyTorch or similar frameworks.
Strong analytical reasoning skills and ability to reason about model behavior and data quality.
Excellent written and verbal English communication skills.

Nice To Haves

Experience with LLM or multimodal evaluation frameworks (e.g., LM Eval Harness, OpenCompass).
Background in computer vision, video understanding, or multimodal learning.
Experience with 3D/4D data pipelines, graphics, or robotics tools (e.g., Blender, COLMAP, PyTorch3D, Open3D).
Familiarity with NeRFs, Gaussian Splatting, SLAM, or embodied AI datasets and simulators.
Experience with video QA, action recognition, or long-context transformer models.
Relevant research experience or publications in top-tier conferences.

Responsibilities

Design and construct high-quality datasets and benchmarks for one or more of the following areas: LLM reasoning and QA (graduate / PhD-level difficulty)
Vision and vision-language modeling
Video understanding, temporal reasoning, and multimodal QA
3D/4D perception, embodied AI, and spatial reasoning
Evaluate LLMs, VLMs, Video-LLMs, and multimodal models on reasoning, factuality, temporal understanding, and spatial tasks.
Develop and maintain evaluation pipelines, metrics, and quality-control criteria for expert-level data generation.
Analyze model outputs, conduct error taxonomy and failure analysis, and summarize insights for internal reports and research papers.
Support research on long-context modeling, data efficiency, compression strategies, and benchmark standardization.
Contribute to open-source datasets, benchmarks, and public leaderboards in collaboration with the 2077AI Foundation.

Benefits

This is a paid internship, with a compensation range of $25–$60 per hour, depending on experience and qualifications.
Interns will work directly with experienced researchers, contribute to high-impact open-source benchmarks and datasets, and gain high-ownership experience shaping evaluation pipelines used by real AI teams.
Exceptional performance may lead to future consideration for full-time opportunities.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume