Research Scientist / Engineer — Multimodal Agent

Luma•Redwood City, CA

10h

About The Position

This is a rare and foundational opportunity to define the future of multimodal AI. You will be at the forefront of building and training large-scale multimodal models, directly impacting how users interact with pixels. This role offers the chance to bridge cutting-edge research with magical, shipped products, working end-to-end on novel problems with no existing playbook. This opportunity involves both the “science” and “engineering” parts of research, two aspects that are of equal importance. This is a multi-stack opportunity where you will work on the intersection of modeling, data, systems, and evaluation.

Requirements

Strong foundation in machine learning, foundation models and agentic systems.
Deep understanding of agentic systems and approaches in LLM/VLM reasoning, coding models, LLM/VLM tool calling.
Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets).

Nice To Haves

State-of-the-art foundation models in reasoning
State-of-the-art foundation models in coding
State-of-the-art foundation models in tool calling
State-of-the-art multimodal agents

Responsibilities

Architect large-scale multimodal agentic models that use reasoning, planning, coding, and tool calling to achieve complex, multi-step multimodal work.
Hillclimbing existing tasks and formulating new tasks through data.
Design, implement, and run robust data pipelines for constructing, enriching, and filtering massive pixel datasets.
Train large-scale multimodal models on massive datasets and GPU clusters.
Define and build novel evaluation frameworks to measure multimodal agents.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume