Data Scientist I

University of FloridaGainesville, FL
3d$61,000 - $75,000

About The Position

A Bachelor's Degree in data science, statistics, bioinformatics, analytics, or similar field and two years of experience; Master's Degree in data science, statistics, bioinformatics, analytics, or similar field. Essential Functions; 1. Design, implement, and maintain scalable data ingestion and processing pipelines supporting the NeuroEnclave within UF’s HIPAA-aligned computing environment. 2. Develop and maintain data validation, profiling, and quality control workflows to ensure data integrity, provenance, and reproducibility across datasets. 3. Engineer and optimize high-performance data workflows for large-scale biomedical datasets using Python-based tools and parallel computing frameworks. 4. Standardize and harmonize heterogeneous data formats to support integrated analytics, AI/ML workflows, and cross-dataset interoperability. 5. Implement technical controls supporting IRB-, HIPAA-, and NIH-compliant data access, including containerized environments, access controls, and audit-ready workflows.

Requirements

  • A Bachelor's Degree in data science, statistics, bioinformatics, analytics, or similar field and two years of experience
  • Master's Degree in data science, statistics, bioinformatics, analytics, or similar field

Nice To Haves

  • Experience working with clinical or biomedical research data.
  • Familiarity with high-performance computing (HPC) or secure research computing environments.
  • Experience with parallel computing frameworks (e.g., Dask or similar).
  • Knowledge of data security, privacy, and compliance considerations (HIPAA, IRB, NIH Data Management & Sharing requirements).
  • Experience supporting data infrastructure for AI/ML or advanced analytics.
  • Prior experience in a research or academic data environment.

Responsibilities

  • Design, implement, and maintain scalable data ingestion and processing pipelines supporting the NeuroEnclave within UF’s HIPAA-aligned computing environment.
  • Develop and maintain data validation, profiling, and quality control workflows to ensure data integrity, provenance, and reproducibility across datasets.
  • Engineer and optimize high-performance data workflows for large-scale biomedical datasets using Python-based tools and parallel computing frameworks.
  • Standardize and harmonize heterogeneous data formats to support integrated analytics, AI/ML workflows, and cross-dataset interoperability.
  • Implement technical controls supporting IRB-, HIPAA-, and NIH-compliant data access, including containerized environments, access controls, and audit-ready workflows.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service