As a Staff Machine Learning Engineer, Multimodal Modeling you will lead the advancement of our core embedding-based retrieval systems, with a primary focus on the scientific aspects of modeling. This includes fine-tuning and extending multimodal models (e.g., CLIP, SigLIP) to improve performance, generalization, and cross-modal alignment. You’ll work on unifying text and image representations, improving model performance, and ensuring extensibility across evolving product use cases. Your work will be central to Flock’s ability to deliver fast, accurate, and scalable search experiences powered by state-of-the-art vision-language systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
501-1,000 employees