Lead Data Engineer - Clinical AI

Qualified HealthPalo Alto, CA
2h$160,000 - $210,000Hybrid

About The Position

Qualified Health is seeking a Principal (Lead) Data Engineer to lead the development of our clinical intelligence layer. In this senior technical role, you'll design and build the data transformation pipelines that convert raw clinical data into AI-ready features, enabling our platform to deliver faster, more accurate clinical insights. You'll work closely with clinical SMEs to translate medical knowledge into scalable data systems, and partner with our AI team to define the data contracts that power LLM-based workflows. This is a technical leadership role for someone who thrives on solving complex data challenges and takes pride in building reliable, production-grade systems.

Requirements

  • 8+ years of data engineering experience, with demonstrated expertise building production data pipelines
  • 5+ years on Databricks, including PySpark, Delta Lake, and Unity Catalog
  • Healthcare data experience: Prior work with FHIR APIs, EHR databases, or claims data
  • Clinical text processing experience: Built pipelines that extract entities from unstructured clinical notes using tools like spaCy, medSpaCy, or cloud NLP services
  • Feature engineering for ML/AI: Experience preparing data for machine learning models or LLM consumption
  • Data quality mindset: Track record implementing validation frameworks and monitoring for data pipelines
  • Healthcare terminology: Familiarity with ICD-10, RxNorm, SNOMED CT, LOINC
  • Epic Clarity experience: Direct work with Epic's relational database structure

Nice To Haves

  • Azure cloud platform: Hands-on with Azure Databricks, Data Lake Storage, Service Bus
  • Clinical NLP tools: Experience with Azure Text Analytics for Health, Amazon Comprehend Medical, or similar
  • RAG architecture patterns: Understanding of vector databases and retrieval-augmented generation

Responsibilities

  • Design and build clinical annotation pipelines that extract conditions, medications, and procedures from unstructured clinical notes
  • Implement negation and temporal detection to distinguish current conditions from historical findings (critical for clinical decision-making)
  • Build business rules engines that classify medications, calculate risk scores, and apply clinical logic at scale
  • Integrate clinical reference data (drug databases, terminology mappings) into transformation pipelines
  • Optimize data structures to reduce LLM processing time and improve downstream AI performance
  • Build production-grade pipelines using PySpark and Databricks for large-scale clinical data processing
  • Implement data quality frameworks to validate clinical transformations and catch issues before they reach AI workflows
  • Design feature stores that serve pre-computed clinical features to ML models and LLM applications
  • Maintain pipeline observability with monitoring, alerting, and performance tracking
  • Partner with clinical SMEs to translate medical knowledge into data transformation logic
  • Define data contracts with AI team to ensure feature outputs meet LLM workflow requirements
  • Contribute to technical standards and best practices for clinical data engineering

Benefits

  • competitive salaries with equity packages
  • robust medical/dental/vision insurance
  • flexible working hours
  • hybrid work options
  • inclusive environment that fosters creativity and innovation
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service