Research Engineer, Computer Vision

MetaPittsburgh, PA
$121,992 - $181,000

About The Position

As a Research Engineer focused on Multi-Modal Understanding, you will develop advanced algorithms that integrate computer vision with other modalities such as language, audio, and sensor data. You will also drive the curation of multi-modal datasets and ground truth annotation pipelines to support model training and evaluation. You will work closely with our research team to bring innovative multi-modal solutions to production, bridging the gap between visual perception and holistic contextual understanding for immersive applications.

Requirements

  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Proven experience with C++ and/or Python, including experience with modern features
  • Experience working with deep learning frameworks such as PyTorch and TensorFlow
  • Demonstrated experience working collaboratively in cross-functional teams

Nice To Haves

  • Master's degree in Computer Science, Computer Vision, Machine Learning, or related field
  • Experience with vision-language models or multi-modal transformers
  • Publications or contributions to multi-modal understanding research
  • Familiarity with large language models and their integration with visual understanding systems
  • Experience with data curation, annotation tools, or ground truth labeling pipelines

Responsibilities

  • Design and implement multi-modal understanding systems that combine vision, language, and other sensory inputs to enable richer contextual awareness
  • Develop algorithms for cross-modal learning, fusion, and reasoning to improve human-AI interaction
  • Lead the curation and management of multi-modal datasets, ensuring data quality and diversity across vision, language, and sensor modalities
  • Design and oversee ground truth annotation workflows and quality assurance processes for multi-modal data
  • Complete medium to large features spanning multiple tasks independently with minimal to no guidance
  • Collaborate with researchers and engineers across computer vision and machine learning teams to drive multi-modal innovation
  • Develop well-organized code with proper testing and documentation, building production-ready multi-modal systems

Benefits

  • bonus
  • equity
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service