Staff Data Scientist, Machine Learning in Epidemiology and Patient Data Products

Valo HealthLexington, MA
$165,000 - $220,000Remote

About The Position

Valo Health is a human-centric, AI-enabled biotechnology company focused on accelerating the discovery and development of new medicines for patients. The company utilizes its Opal Computational Platform, which integrates real-world data, AI, human translational models, and predictive chemistry. Valo is committed to fostering diversity, growth, and an inclusive environment, encouraging collaboration among individuals with varied experiences and backgrounds to drive patient-centric innovation. The Staff Data Scientist will be a key member of the data science team, responsible for building a computational platform to advance drug discovery and development. This role involves developing machine learning tools for patient data and promoting their adoption across teams, guided by epidemiology and biology program leads. The successful candidate will collaborate with a diverse group of scientists and domain experts in an innovative startup setting, transcending traditional industry boundaries.

Requirements

  • MS, MPH, or PhD in health data science, biostatistics, or a related quantitative field.
  • 5 years of experience developing and applying ML methods.
  • At least 3 years working directly with real-world patient data.
  • Extensive experience developing and implementing machine learning solutions in healthcare databases, including EHRs, administrative claims, and patient registries.
  • Familiarity with U.S. and global medical coding ontologies and data models (ICD, ATC, LOINC, SNOMED, CPT, HCPCS, OMOP, etc.).
  • Confident working with highly sparse and high-dimensional data.
  • Extensive experience building, maintaining, and operationalizing ML pipelines, and translating model outputs into meaningful insights for diverse audiences.
  • Broad proficiency across core ML paradigms (e.g., supervised, unsupervised, semi-supervised) and experience with linear and logistic regression, classification and tree‑based methods, clustering and dimensionality‑reduction techniques, and deep learning architectures.
  • Strong grounding in key components of the ML development lifecycle, including evaluation metrics, hyperparameter tuning, model selection, feature engineering and selection, model explainability, and MLOps best practices.
  • Mastery of Python and modern data science tools (e.g., scikit-learn, PyTorch, statsmodels, SciPy, MLlib, MLflow).
  • Comfortable working in ambiguous problem spaces; experience working in a start-up or agile work environment as part of cross-functional project teams.
  • Ability to lead and facilitate meetings and work collaboratively on multi-disciplinary project teams.
  • Exceptional time management, ability to prioritize multiple tasks simultaneously, and deliver products on time.
  • Enthusiastic about documentation, ensuring all analyses are clear and reproducible with thorough documentation of key assumptions and decision points.
  • Permanent US work authorization without the need for immediate or future sponsorship.

Nice To Haves

  • Experience in a biopharmaceutical, epidemiological or biostatistical setting.
  • Experience processing and mining clinical notes.
  • Hands-on experience with representation learning and transformer-based and other sequence models.
  • Experience with AI-assisted coding tools (e.g., Claude Code).
  • Advanced knowledge of biostatistics approaches, including inferential and predictive modeling.
  • Experience in causal approaches for observational studies, including propensity score methods, bias adjustment, and covariate selection and adjustment.
  • Familiarity with or exposure to traditional drug discovery and development processes and approaches.

Responsibilities

  • Lead the development of machine learning (ML) methods and analyses of patient data with diverse stakeholders, integrating clinical insights into supervised and unsupervised learning approaches and generating patient profiles.
  • Perform project-specific hands-on analysis and modeling of high-dimensional longitudinal real-world data, including electronic medical records (EHRs), clinical notes, sequencing data, and multi-omics, using modern data science tools in cloud environments.
  • Contribute to the design, implementation, and evaluation of innovative machine learning approaches for patient data to provide novel clinical insights.
  • Be comfortable with scientific uncertainty and embrace curiosity and creative solutions, as many challenges lack known solutions or established pathways.
  • Utilize technical knowledge and intuition to articulate and break down large problems into solvable pieces, prioritizing critical-path issues.
  • Act as a dynamic and active team member, championing shared coding standards, participating in code reviews, and providing regular updates and input on colleagues' work.

Benefits

  • healthcare coverage
  • annual incentive program
  • retirement benefits
  • a broad range of other benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service