Senior Applied Researcher

Techire Ai•San Francisco, CA

12h•Onsite

About The Position

Want to build vision-language models that understand complex, real-world environments? You’ll join a small, highly technical team working on foundational problems in multimodal AI, focused on training models that can interpret, reason, and act on large-scale first-person video data. You’ll work directly with the Chief Science Officer, shaping how models are designed, trained, and evaluated. The work sits at the intersection of VLMs, long-context reasoning, and real-world deployment. The focus is on building systems that move beyond static perception, towards temporal understanding, activity recognition, and higher-level reasoning across dynamic environments.

Requirements

strong experience training deep learning models, ideally transformer-based
hands-on work in vision, language, or multimodal systems
Experience with large datasets, model optimisation, or deploying models into production environments will be valuable
Exposure to video data or long-context modelling is particularly relevant

Responsibilities

Designing and training VLMs on large-scale video datasets
Developing post-training approaches including SFT, RLHF, and parameter-efficient tuning
Building scalable training and evaluation pipelines
Exploring long-context and temporal modelling
Designing efficient systems across edge and server-side inference
Defining benchmarks for spatial and behavioural understanding