Conversational Modelling Research Engineer

Tavus

9d•Hybrid

About The Position

Tavus is a research lab pioneering human computing, focused on building AI Humans as a new interface to bridge the gap between people and machines. These AI Humans are designed to see, hear, respond, and look real, enabling meaningful, face-to-face conversations free from current system friction. They combine human emotional intelligence with machine reach and reliability, offering capable, trusted agents available 24/7 in every language. Tavus aims to allow individuals, enterprises, and developers to build AI Humans for empathetic connection, understanding, and action at scale. The company is a Series B startup backed by investors including Sequoia Capital, Y Combinator, and Scale Venture Partners. The role is for an AI Researcher to join the core AI team, tasked with pushing the boundaries of Foundation Multimodal Conversational Models. The ideal candidate thrives in fast-moving startup environments, enjoys experimenting with new ideas, and wants to see their work implemented in production.

Requirements

A PhD (or near completion) in a relevant field, or equivalent research experience.
Hands-on experience with Large Multimodal Models and a strong foundation in generative (language) models. This could be in the context of tasks such as VQA, Audio/Video understanding tasks, captioning behavioral analysis, Translation tasks, Speech to Speech systems.
Experience in fine-tuning/adapting VLMs for control, conditioning, or downstream tasks.
Solid background in deep learning and foundation modes.
Strong PyTorch skills and comfort building deep learning pipelines.

Nice To Haves

Knowledge of large-scale model training and optimization.
Experience in duplex-conversational model.
Broader understanding of generative AI across modalities.
Exposure to software development best practices.
A flexible, experimental mindset i.e. comfortable working across research and engineering.
(Bonus) Publications at EMNLP, COLING, NeurIPS, ICLR, CVPR, ICCV.

Responsibilities

Conduct research on Large Multimodal Models in the context of Conversational Avatars (e.g. Neural Avatars, Talking-Heads).
Develop methods to model both verbal and non-verbal aspects of conversation, adapting and controlling avatar behavior in real time, with low-latency.
Experiment with fine-tuning, adaptation, and conditioning techniques to make AudioVisual Multimodal Models, more expressive, controllable, and task-specific.
Partner with the Applied ML team to take research from prototype to production.
Stay up to date with cutting-edge advancements — and help define what comes next.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume