We’re seeking a Machine Learning Engineer who thrives at the frontier of foundation-model research and production engineering. You’ll help define how machines learn from motion: training and fine-tuning large-scale Vision-Language Models to reason about complex, real-world video. Your work will involve building multi-modal architectures that perceive, localize, and describe motion events (turns, lane changes, interactions, anomalies) across millions of frames, and turning those breakthroughs into robust APIs and SDKs used by enterprise customers. You’ll work directly with the founders to: Train and evaluate VLMs specialized for motion understanding in autonomous-driving and robotics datasets. Design and scale GPU-accelerated pipelines for training, fine-tuning, and inference on multi-modal data (video + language + sensor metadata). Build agentic evaluation frameworks that benchmark spatiotemporal reasoning, localization accuracy, and narrative consistency. Develop and productionize curation loops that use our own models to generate and refine datasets (“AI training AI”). Publish high-impact research (e.g., NeurIPS, CVPR) while shipping features that customers use immediately.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
11-50 employees