Senior+ AI Researcher (Multimodal Perception Models)

Tavus•San Francisco, CA

77d

About The Position

Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms. Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale. We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners. Be part of shaping a future where humans and machines truly understand each other.

Requirements

A PhD plus 2–3+ years working hands-on with LLMs, VLMs, or multimodal systems.
Previous experience leading research efforts or mentoring teams.
Expertise in sequence modeling across video, audio, and text — with strong understanding of autoregressive, predictive, and diffusion frameworks.
Experience with large-scale model training and optimization for performance and real-time generation.
Proven ability to translate research ideas into production-grade systems.
Publications in top-tier venues (CVPR, ICCV, NeurIPS, ECCV, ACMMM).
Strong PyTorch skills and comfort moving fluidly between research and engineering.

Nice To Haves

Broad familiarity with generative AI paradigms and foundation models.
Comfort working across the full research–to–deployment stack.
A builder’s mindset: eager to experiment, iterate, and ship.

Responsibilities

Lead research on Foundational Multimodal Models for Conversational Avatars — systems that can perceive, reason, and generate across video, audio, and language.
Build and train models using Autoregressive, Predictive (e.g., V-JEPA), and Diffusion-based architectures with a deep focus on temporal and sequential data.
Design and execute experiments to predict and control the visual, auditory, and linguistic responses of avatars.
Partner with the Applied ML team to bring research into real-world use cases.
Mentor other researchers and drive excellence across the team.

Benefits

Flexible work schedule
Unlimited PTO
Competitive healthcare
Gear stipends
Diverse and supportive team culture

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Ph.D. or professional degree

Number of Employees

11-50 employees

Senior+ AI Researcher (Multimodal Perception Models)

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company