Data Scientist

The University of Texas at Arlington PortalArlington, TX
6dOnsite

About The Position

The Data Scientist design, build, and operate robust, secure data pipelines that power clinical-research products, analytics dashboards, and downstream data-science workloads. Partner closely with clinicians, investigators, Office of Information Technology and collaborators’ Business Intelligence teams, and external research collaborators to translate complex biomedical data into actionable insights.

Requirements

  • Bachelor’s degree in Computer Science, Engineering or related field (or equivalent experience).
  • Seven (7) years of professional experience in data engineering, software development, or an equivalent mix of education and relevant experience in similar role.

Nice To Haves

  • Experience with Snowflake, Microsoft Azure Synapse, or other modern data- warehouse platforms.
  • Exposure to machine-learning pipelines (e.g., using OpenAI or other LLM services).
  • Experience building/maintaining cloud data platforms (such as GCP , OCI , Linode, AWS , Azure) and data-lake/warehouse solutions, as well as production workload management.
  • Hands-on Linux system administration (containerization, networking, security).

Responsibilities

  • Architect end-to-end pipelines that ingest high-volume de-identified clinical, genomic and phenotypic datasets from collaborators’ EHR systems (Epic Clarity/Caboodle) and cloud storage.
  • Build and host production-grade web portals and REST APIs for secure researcher/clinician access supporting role-based permissions and audit trails.
  • Leverage OpenAI LLMs (or similar NLP services) to auto-extract Human Phenotype Ontology ( HPO ) terms from de-identified clinical documentation.
  • Design high-throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud-native warehouses, feeding results into downstream analytics pipelines.
  • Design and develop real-time capable analytical systems to integrate with and/or augment EHR systems.
  • Perform systems administration for data-platform hosts, including system hardening, patch management, firewall configuration.
  • Implement monitoring stacks and custom health checks to maintain near-continuous system availability.
  • Translate clinical research requirements into technical specifications, producing clear data-model diagrams, lineage documentation, and data-dictionary artifacts.
  • Deliver data-product demos to investigators, effectively showcasing how pipeline outputs support precision medicine reporting.
  • Champion standards for metadata management, schema versioning, and test-driven data engineering.
  • Other duties as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service