AI Researcher (Multimodal Perception Models)

Tavus•San Francisco, CA

77d

About The Position

Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms. Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale. We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners. Be part of shaping a future where humans and machines truly understand each other.

Requirements

A PhD (or near completion) in a relevant field, or equivalent hands-on research experience.
Experience modeling human behavior and generation (facial expressions, affect, or speech). Ideally in conversational or interactive settings.
Deep understanding of sequence modeling in video/audio/language domains.
Familiarity with large model training, especially LLMs or VLMs.
Strong background in Deep Learning (from Transformers to Diffusion Models) and how to make them work in practice.
Excellent programming skills, especially in PyTorch.

Nice To Haves

Publications in top-tier conferences like CVPR, ICCV, NeurIPS, ECCV, or ACMMM.
Broader understanding of generative AI and multimodal architectures.
Familiarity with software engineering best practices.
Curiosity and a flexible mindset — you like building and experimenting.

Responsibilities

Conduct research on Foundational Multimodal Models in the context of Conversational Avatars (e.g., Neural Avatars, Talking-Heads).
Model video, audio, and language sequences using Autoregressive, Predictive Architectures (e.g., V-JEPA), and/or Diffusion paradigms with an emphasis on temporal and sequential data rather than static images.
Collaborate with the Applied ML team to bring your work to life in production systems.
Stay at the cutting edge of multimodal learning and help us define what 'cutting edge' means next.

Benefits

Flexible work schedule.
Unlimited PTO.
Competitive healthcare.
Gear stipends.
Supportive and diverse team culture.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Entry Level

Education Level

Ph.D. or professional degree

Number of Employees

11-50 employees

AI Researcher (Multimodal Perception Models)

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company