Want to build vision-language models that understand complex, real-world environments? You’ll join a small, highly technical team working on foundational problems in multimodal AI, focused on training models that can interpret, reason, and act on large-scale first-person video data. You’ll work directly with the Chief Science Officer, shaping how models are designed, trained, and evaluated. The work sits at the intersection of VLMs, long-context reasoning, and real-world deployment. The focus is on building systems that move beyond static perception, towards temporal understanding, activity recognition, and higher-level reasoning across dynamic environments.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed