Research Assistant

University of North Texas SystemDenton, TX
Onsite

About The Position

The Division of Vocal Studies in the College of Music is hiring a Research Intern to assist with Vocal Pedagogy research. As a Research Intern, this employee will support the development and continuous improvement of a deep learning pipeline designed to analyze laryngostroboscopic imaging of singers. This includes organizing and preprocessing video and frame data, fine-tuning vision models, deploying an end-to-end inference workflow, integrating human-in-the-loop feedback, and drive model performance towards =90% accuracy. The research assistant will also help to co-mentor student researchers involved in the project.

Requirements

  • Master’s degree (or equivalent experience) in Computer Science, Data Engineering, Machine Learning, Biomedical Imaging, or related field.
  • Proficiency in Python and deep-learning frameworks (PyTorch or TensorFlow/Keras), plus libraries such as timm, XGBoost, MoviePy, Pandas, NumPy.
  • Hands-on experience with vision backbones (transformers and/or advanced CNNs) and multi-output regression.
  • Strong skills in image/video preprocessing, class balancing, and model checkpoint management.
  • Familiarity with human-in-the-loop feedback workflows and active-learning strategies.

Nice To Haves

  • Experience containerizing or deploying ML services using Docker, FastAPI, or Streamlit.
  • Knowledge of experiment-tracking tools (TensorBoard, MLflow).
  • Excellent written and verbal communication; proven ability to collaborate in interdisciplinary teams.
  • Background in laryngeal imaging, stroboscopy, or voice science.

Responsibilities

  • Organize and version raw and processed videos/frames in local storage and OneDrive using structured manifests and Git / DVC.
  • Implement balanced sampling and augmentation pipelines to correct class imbalance (Mode, Density, Color).
  • Fine-tune and experiment with state-of-the-art vision backbones (e.g., Vision Transformer, Swin/ConvNeXt, EfficientNet, 3-D CNNs, Hybrid CNN-Transformers) to classify Mode, Density, and Color.
  • Extract deep visual features and evaluate a variety of downstream learners (e.g., XGBoost, fully connected nets, tabular transformers, ensemble regressors) to predict all 32 physiological rating parameters.
  • Run 25+ epoch training cycles in VS Code or Google Colab with systematic checkpointing and metrics logging.
  • Develop scripts that autonomously parse raw laryngostroboscopic video files and extract frames for the corresponding Class and Subclass.
  • For each extracted frame, run the trained pipeline to classify Mode, Density, and Color, then predict all 32 positional rating parameters.
  • Collate results—including generated timestamps, class labels, and ratings—into a single Excel report matching the original Training Data Sheet’s structure.
  • Build an interactive CLI / Streamlit interface so users can confirm or correct model predictions.
  • Store verified feedback and schedule periodic retraining to incorporate corrections, driving accuracy toward = 90%..
  • Co-mentor student researchers involved in the project.

Benefits

  • Information regarding our Benefits can be found at https://hr.untsystem.edu/benefits/index.php.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service