About The Position

Reality Labs is building the future of connection through world-class AR/VR hardware and software. The XR Tech AIX (AI Experiences) team is developing cutting-edge real-time AI systems that power next-generation communication experiences. We are creating intelligent agents that seamlessly interface with fine-tuned foundation models to enable rich, real-time interactions in video calling and telepresence scenarios. We are seeking an exceptional Research Scientist Intern to join our team and contribute to the development of real-time multimodal AI systems. This role focuses on fine-tuning and optimizing large foundation models—particularly vision-language models—for real-time agent-based applications. You will work at the intersection of multimodal learning, real-time systems, and agentic AI. Our internships are twelve (12) to twenty-four (24) weeks long with a flexible summer start date.

Requirements

  • Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • 2+ years of research experience in one or more of the following areas: multimodal learning, vision-language models, large language models, or foundation model fine-tuning
  • Hands-on experience fine-tuning large foundation models (e.g., LLaVA, InternVL, Qwen-VL, LLaMA, or similar)
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch
  • Excellent communication skills and ability to work independently
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Nice To Haves

  • Proven track record of achieving significant results as demonstrated by first-authored publications at leading conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ICASSP, Interspeech, ACL, EMNLP, or similar
  • Experience with speech-to-speech LLMs or audio-visual foundation models
  • Familiarity with real-time communication systems (e.g., LiveKit, WebRTC) or low-latency inference optimization
  • Experience with cloud infrastructure (AWS) and containerization (Docker)
  • Experience with parameter-efficient fine-tuning techniques (LoRA, QLoRA, adapters, etc.)
  • Experience with agentic AI systems, tool-use, or function-calling in LLMs
  • Demonstrated software engineering experience via internships, work experience, or contributions to open source repositories (e.g., GitHub)
  • Intent to return to degree program after completion of the internship

Responsibilities

  • Research and develop novel approaches for fine-tuning large multimodal foundation models (vision-language, audio-visual) for real-time applications
  • Design and implement efficient inference pipelines for deploying fine-tuned models in real-time communication scenarios
  • Explore agentic architectures that leverage fine-tuned models as tools within larger AI systems
  • Collaborate with cross-functional teams to integrate models into prototype experiences
  • Document and present research progress with the goal of publishing findings at top-tier ML/CV conferences
  • Contribute to building working prototypes that demonstrate the capabilities of fine-tuned multimodal models
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service