Sr Data Scientist I

Advarra

105d•Remote

About The Position

The AI Data Scientist will focus on optimizing, evaluating, and operationalizing advanced machine learning models within Advarra’s Braid platform—the intelligence layer connecting data, insights, and products across the clinical research ecosystem. This role emphasizes improving and fine-tuning large language models (LLMs) using proprietary datasets to enhance precision, recall, and contextual relevance across clinical and operational data.

Requirements

MS in Machine Learning, Computer Science, or related quantitative discipline, or equivalent relevant work experience.
5+ years of hands-on experience developing and fine-tuning ML or LLM models
Demonstrated expertise in Python, with experience and knowledge of a commercial framework like PyTorch.
Hands-on experience developing, managing, and troubleshooting workflows within Databricks for data engineering, analytics, and machine learning projects
Documented strong understanding of the ML lifecycle
Experience with embeddings and retrieval-augmented generation (RAG)

Nice To Haves

PhD in Machine Learning, Computer Science, or a related quantitative discipline.
Previous experience excelling in a fast-paced, applied research setting where experimentation, iteration, and roadmap alignment are critical.
Experience with causal inference, simulation modeling, or graph-based reasoning applied to clinical development or biomedical research.
Hands-on fluency in Databricks notebooks for exploratory analysis, model development, and workflow orchestration.
Curiosity for how AI training and inference performance impacts both user experience and downstream business value.
Mindset of continuous learning, with the ability to bridge experimental work and high-value product applications.

Responsibilities

Focus on understanding existing models, assessing their performance, selecting optimal architectures, and fine-tuning them to meet specific domain and business needs—including retrieval-augmented generation (RAG) based applications.
Collaborate closely with data engineering, product, and domain teams to translate real-world research challenges into scalable, model-driven solutions that accelerate Advarra’s vision of a digitally connected research data and technology fabric.
Optimize and fine-tune large language models (LLMs) and domain-specific variants using proprietary datasets to achieve precision and recall targets that drive differentiated customer value.
Evaluate model performance across key metrics and benchmarks, identifying strengths, weaknesses, and opportunities for improvement across predictive, generative, and retrieval-augmented tasks.
Implement and operationalize LLM-based and retrieval-augmented (RAG) systems that enhance Braid-powered products such as Study Design and Site Feasibility.
Collaborate with data engineering to ensure scalable, efficient model training, evaluation, and deployment pipelines using Databricks, MLflow, and Delta Lake.
Assess and select models—open-source or proprietary—that best align with domain-specific requirements and Advarra’s regulated research environment.
Partner with clinical and operational experts to translate research and trial challenges into measurable model evaluation frameworks and optimization strategies.
Conduct model interpretability and bias analyses to ensure fairness, transparency, and compliance with governance standards.
Document methodologies and validation results to support internal governance, reproducibility, and audit readiness.
Contribute to reusable fine-tuning workflows, evaluation frameworks, and model monitoring pipelines within the Braid AI stack.
Stay at the forefront of advancements in LLM optimization, retrieval augmentation, and multi-modal learning, applying new methods that improve scalability, explainability, and cost efficiency