Vision Language Model Engineer

EchoTwin AISan Francisco, CA
123d

About The Position

As a Vision Language Model Engineer, you will design, develop, and optimize advanced vision-language models that integrate visual and textual data to enable intelligent systems. You will work closely with cross-functional teams to build models that power applications such as image captioning, visual question answering, and multimodal AI at the edge.

Requirements

  • Bachelor’s, Master’s or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, or a related field (or equivalent experience).
  • 3+ years of experience in machine learning, with a focus on vision-language models or multimodal AI.
  • Hands-on experience with deep learning frameworks such as PyTorch or TensorFlow.
  • Proven track record of building and deploying computer vision and/or NLP models.
  • Proficiency in Python and relevant ML libraries (e.g., Hugging Face, OpenCV, Transformers).
  • Experience with large-scale model training and optimization (e.g., distributed training, quantization).
  • Strong understanding of neural network architectures (e.g., CNNs, Transformers, CLIP, or similar).
  • Experience with multimodal datasets and preprocessing techniques for images and text.
  • Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and model deployment workflows.
  • Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
  • Excellent communication skills to explain complex technical concepts to diverse audiences.

Responsibilities

  • Design and implement state-of-the-art vision-language models using deep learning frameworks.
  • Develop and fine-tune models that combine computer vision and natural language processing for tasks like image captioning, visual question answering, and text-to-image generation.
  • Collaborate with data scientists and software engineers to integrate models into production systems.
  • Optimize model performance for accuracy, latency, and scalability in real-world applications.
  • Conduct experiments to evaluate model performance and iterate on architectures and training pipelines.
  • Stay up-to-date with the latest research in vision-language models and incorporate advancements into projects.
  • Contribute to data preprocessing, augmentation, and annotation pipelines for multimodal datasets.
  • Document model development processes and present findings to technical and non-technical stakeholders.

Benefits

  • Endless learning and development opportunities from a highly diverse and talented peer group.
  • Options for medical, dental, and vision coverage for employees and dependents (for US employees).
  • Flexible Spending Account (FSA) and Dependent Care Flexible Spending Account (DCFSA).
  • 401(k) with 3% company matching.
  • Unlimited PTO.
  • Profit sharing.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service