Research Assistant

University of North Texas System•Denton, TX

8d•Onsite

About The Position

The Division of Vocal Studies in the College of Music is hiring a Research Intern to assist with Vocal Pedagogy research. As a Research Intern, this employee will support the development and continuous improvement of a deep learning pipeline designed to analyze laryngostroboscopic imaging of singers. This includes organizing and preprocessing video and frame data, fine-tuning vision models, deploying an end-to-end inference workflow, integrating human-in-the-loop feedback, and drive model performance towards =90% accuracy. The research assistant will also help to co-mentor student researchers involved in the project.

Requirements

Master’s degree (or equivalent experience) in Computer Science, Data Engineering, Machine Learning, Biomedical Imaging, or related field.
Proficiency in Python and deep-learning frameworks (PyTorch or TensorFlow/Keras), plus libraries such as timm, XGBoost, MoviePy, Pandas, NumPy.
Hands-on experience with vision backbones (transformers and/or advanced CNNs) and multi-output regression.
Strong skills in image/video preprocessing, class balancing, and model checkpoint management.
Familiarity with human-in-the-loop feedback workflows and active-learning strategies.

Nice To Haves

Experience containerizing or deploying ML services using Docker, FastAPI, or Streamlit.
Knowledge of experiment-tracking tools (TensorBoard, MLflow).
Excellent written and verbal communication; proven ability to collaborate in interdisciplinary teams.
Background in laryngeal imaging, stroboscopy, or voice science.

Responsibilities

Organize and version raw and processed videos/frames in local storage and OneDrive using structured manifests and Git / DVC.
Implement balanced sampling and augmentation pipelines to correct class imbalance (Mode, Density, Color).
Fine-tune and experiment with state-of-the-art vision backbones (e.g., Vision Transformer, Swin/ConvNeXt, EfficientNet, 3-D CNNs, Hybrid CNN-Transformers) to classify Mode, Density, and Color.
Extract deep visual features and evaluate a variety of downstream learners (e.g., XGBoost, fully connected nets, tabular transformers, ensemble regressors) to predict all 32 physiological rating parameters.
Run 25+ epoch training cycles in VS Code or Google Colab with systematic checkpointing and metrics logging.
Develop scripts that autonomously parse raw laryngostroboscopic video files and extract frames for the corresponding Class and Subclass.
For each extracted frame, run the trained pipeline to classify Mode, Density, and Color, then predict all 32 positional rating parameters.
Collate results—including generated timestamps, class labels, and ratings—into a single Excel report matching the original Training Data Sheet’s structure.
Build an interactive CLI / Streamlit interface so users can confirm or correct model predictions.
Store verified feedback and schedule periodic retraining to incorporate corrections, driving accuracy toward = 90%..
Co-mentor student researchers involved in the project.