Clinical Development Data Science Intern

GenmabPrinceton, TX
20dHybrid

About The Position

The Clinical Development Data Science Intern will join Genmab’s Clinical Development Data Science team and contribute to innovative AI/ML initiatives across the clinical development lifecycle. This role is ideal for a highly technical, impact-driven data scientist who is pursuing a Master's or PhD in a quantitative or computational discipline, and who is excited to apply modern machine learning, natural language processing, and generative AI to real clinical and scientific challenges in oncology. The intern will support projects such as building AI-powered tools to assist trial design, developing analytics pipelines for clinical and safety data, and prototyping advanced models that help clinicians and scientists make better, faster decisions. You will collaborate with experienced data scientists, data engineers, clinicians, and other cross-functional partners to deliver meaningful analyses, prototypes, and decision-support tools used across clinical development.

Requirements

  • Sustained progress in a PhD program in a quantitative or computational discipline (Computer Science, Data Science, Statistics, Physics, Computational Chemistry, Computational Biology, Applied Mathematics, Engineering, or related quantitative/technical field). Master’s students with strong applied experience will also be considered.
  • Strong proficiency in modern Python for machine learning and scientific computing (e.g., PyTorch/TensorFlow, scikit-learn, spaCy, Hugging Face transformers, pandas, NumPy). Candidates should be experienced in building modular scripts and reusable modules, command-line utilities, and structured codebases for data processing and machine learning workflows.
  • Experience applying ML or NLP methods to real-world datasets.
  • Ability to design, build, and evaluate computational models end-to-end.
  • Excellent analytical problem-solving abilities and intellectual curiosity.
  • Clear written and verbal communication skills.
  • High attention to detail, scientific rigor, and commitment to producing high-quality work.
  • Ability to work independently as well as collaboratively within interdisciplinary teams.

Nice To Haves

  • Experience with large language models, retrieval-augmented generation (RAG), or generative AI workflows.
  • Prior work involving unstructured document parsing (e.g., PDFs, scanned text, scientific literature).
  • Full-stack data science skill set, including experience with: Writing production-quality, modular, and reusable Python code Building scalable, reproducible systems and packaging code for broader use Containerization (e.g., Docker) and reproducible compute environments Workflow or pipeline orchestration tools (Airflow, Prefect, Dagster, etc.) Cloud computing environments (AWS, Azure, GCP)
  • Experience using Git repositories and version-control workflows, including: Branching strategies, pull requests, peer code reviews Good commit hygiene and collaborative development practices Familiarity with GitHub, GitLab, or similar platforms
  • Exposure to CI/CD, structured logging, automated testing, or other professional engineering practices.
  • Experience with MLOps concepts such as model packaging, monitoring, reproducibility, and deployment is a strong plus.

Responsibilities

  • Develop machine learning and NLP pipelines to extract, organize, and summarize information from clinical trial protocols, regulatory documents, and unstructured biomedical text.
  • Support the development of AI-driven tools that: Accelerate protocol design through information retrieval, extraction, and summarization. Enhance pharmacovigilance and safety monitoring through risk modeling, anomaly detection, and severity prediction.
  • Apply advanced modeling techniques including transformers, generative models, survival analysis, clustering, and other ML approaches.
  • Experiment with explainability methods (e.g., SHAP, LIME) to improve the interpretability of models used in clinical decision-support contexts.
  • Contribute to documentation, model evaluation, technical reviews, and final presentations to internal stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service