Senior Data Scientist

Probably GeneticSan Francisco, CA
$180,000 - $230,000Hybrid

About The Position

We are looking for a Senior Data Scientist who will own some of the most consequential diagnostic AI in rare disease: building, validating, and operationalizing the models that help us find and diagnose patients who have never had a name for their disease, powering the analytical rigor behind our testing programs, and shaping how we use data to make smarter product decisions.

Requirements

  • 7+ years of experience in data science, machine learning engineering, or a closely related field
  • Strong Python proficiency and fluency across the core data science stack: pandas, NumPy, scikit-learn, PySpark, and SQL
  • Demonstrated end-to-end ML experience: you have taken models from problem definition through feature engineering, validation, deployment, and monitoring in a production environment
  • Experience with NLP techniques and applying language models to real-world problems
  • Comfort with prompt engineering and evaluating external AI API performance (e.g., OpenAI)
  • A track record of operating with high ownership in lean, fast-moving environments where you have had to build structure as much as execute within it
  • Strong analytical communication skills — you can translate complex model outputs and data findings into clear, actionable narratives for technical and non-technical audiences alike

Nice To Haves

  • Experience with Databricks or similar lakehouse/ML platform environments
  • Familiarity with synthetic data generation techniques
  • Domain knowledge in healthcare, rare disease, genomics, or clinical research
  • Experience with MLOps tooling and building observability infrastructure from scratch
  • Exposure to biopharma or insurance analytics use cases

Responsibilities

  • Own the end-to-end development, validation, and operationalization of PG's predictive diagnostic AI models — from feature engineering through production deployment – that power program eligibility decisions and clinical decisions for patients
  • Run prospective testing experiments: apply diagnostic models to undiagnosed patients, coordinate testing, and track outcomes to continuously improve model performance
  • Build and maintain PG's synthetic patient data pipeline, a critical deliverable for our research programs, and key input to our own model development lifecycle
  • Optimize our patient intake experience using NLP and multimodal data analysis to determine which questions to ask, in what order, to maximize data quality and conversion
  • Own API usage and cost optimization across PG's AI stack, including prompt engineering, model evaluation, and ongoing performance monitoring
  • Conduct ad hoc strategic analyses that inform product prioritization, causality assessment, and generate customer-facing program insights
  • Establish MLOps infrastructure: model monitoring, drift detection, API observability, and lightweight but durable operational processes
  • Have the freedom to conduct blue sky research initiatives aimed at creating value from our data
  • Work with Data Engineering to build a robust, scalable data foundation that supports all of the above

Benefits

  • Fair and equitable compensation with competitive early-stage equity grants
  • Generous Flexible Time off policy, that we actually use
  • Parental Leave Benefits (12 weeks for both birthing and non-birthing)
  • Hybrid, flexible work with high-trust and autonomy
  • A bright, inviting, pet-friendly office in Downtown SF near transit
  • A “work from anywhere” policy, up to 4 weeks a year
  • Regular team retreats in exciting destinations
  • Health Benefits including medical, dental, vision, therapy, FSA, and 401k
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service