Lead Data Engineer, Health Data Platforms

GuidehouseBethesda, MD
23h$113,000 - $188,000

About The Position

We are searching for Lead Data Engineer, Health Data Platforms to support future opportunities. The person filling this role will build, scale, and maintain national-scale data pipelines, integrations, and storage solutions that power laboratory, research, and healthcare operations. They will lead the modernization and harmonization of diverse clinical data, ensuring interoperability, data quality, and compliance across platforms. This position will be based in Bethesda, MD. Architect and modernize robust, disease-agnostic data acquisition and ingestion pipelines for large-scale, heterogeneous healthcare data (e.g., EHRs, claims, registries, geospatial data). Design, implement, and maintain ETL/ELT pipelines across cloud platforms (AWS, Azure, GCP). Design and maintain scalable, reliable, and flexible applications using TypeScript, NodeJS, Angular, and RESTful web services to support data workflows. Integrate data sources including ELN, LIMS, sample tracking software, web-based portals, and REST APIs. Maintain and enhance data harmonization pipelines such as OMOP, improve interoperability among data models including OMOP, PCORNet, and FHIR, and ensure consistency and alignment for critical data types to support master data integration and harmonization. Implement and manage data storage solutions (data lakes, warehouses) utilizing the appropriate partitioning, security, and lifecycle policies. Champion data quality and governance standards through the development of sophisticated data quality frameworks, dashboards, and feedback loops to ensure transparency in data completeness, consistency, and quality for partners and researchers. Optimize pipeline performance, reliability, and cost by establishing monitoring and alerting functions. Innovate with advanced technologies: integrate new data sources (e.g., national mortality data, CMS), link datasets, and build processes for novel data types (geospatial, environmental). Collaborate with informatics, bioinformatics, and platform teams to define data models and SLAs. Provide technical leadership and mentorship Translate scientific needs into technical solutions in an agile, mission-focused environment. Implement CI/CD for data workflows and infrastructure-as-code (e.g., Terraform, ARM, CloudFormation). Document data architectures, lineage, and standards.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Data Engineering, Bioinformatics, or related field.
  • A minimum of EIGHT (8) years of hands-on data engineering experience
  • Deep, practical knowledge of healthcare data, common data models (OMOP, FHIR), and clinical terminologies (such as ICD, SNOMED, RxNorm).
  • Solid experience with big data technologies (Apache Spark, Hadoop), containerization (Docker), and reproducible/scalable workflows.
  • Hands-on experience with cloud services and REST API integrations.
  • Strong SQL, Python, and version control (Git) skills.
  • Experience with privacy-preserving record linkage (PPRL), federated data systems, and regulated environments (FISMA, HIPAA).

Nice To Haves

  • Master’s degree is preferred
  • Designing/deploying data solutions on cloud platforms (AWS, GCP, Azure).
  • Proficiency with workflow management systems (Nextflow, Snakemake, Airflow).
  • Experience with regulated environments (GxP, 21 CFR Part 11) and data governance.
  • Proficiency with modern data tools (e.g., Spark/Databricks, Airflow, dbt, Kafka) is a plus.

Responsibilities

  • Architect and modernize robust, disease-agnostic data acquisition and ingestion pipelines for large-scale, heterogeneous healthcare data (e.g., EHRs, claims, registries, geospatial data).
  • Design, implement, and maintain ETL/ELT pipelines across cloud platforms (AWS, Azure, GCP).
  • Design and maintain scalable, reliable, and flexible applications using TypeScript, NodeJS, Angular, and RESTful web services to support data workflows.
  • Integrate data sources including ELN, LIMS, sample tracking software, web-based portals, and REST APIs.
  • Maintain and enhance data harmonization pipelines such as OMOP, improve interoperability among data models including OMOP, PCORNet, and FHIR, and ensure consistency and alignment for critical data types to support master data integration and harmonization.
  • Implement and manage data storage solutions (data lakes, warehouses) utilizing the appropriate partitioning, security, and lifecycle policies.
  • Champion data quality and governance standards through the development of sophisticated data quality frameworks, dashboards, and feedback loops to ensure transparency in data completeness, consistency, and quality for partners and researchers.
  • Optimize pipeline performance, reliability, and cost by establishing monitoring and alerting functions.
  • Innovate with advanced technologies: integrate new data sources (e.g., national mortality data, CMS), link datasets, and build processes for novel data types (geospatial, environmental).
  • Collaborate with informatics, bioinformatics, and platform teams to define data models and SLAs.
  • Provide technical leadership and mentorship
  • Translate scientific needs into technical solutions in an agile, mission-focused environment.
  • Implement CI/CD for data workflows and infrastructure-as-code (e.g., Terraform, ARM, CloudFormation).
  • Document data architectures, lineage, and standards.

Benefits

  • Medical, Rx, Dental & Vision Insurance
  • Personal and Family Sick Time & Company Paid Holidays
  • Parental Leave
  • 401(k) Retirement Plan
  • Group Term Life and Travel Assistance
  • Voluntary Life and AD&D Insurance
  • Health Savings Account, Health Care & Dependent Care Flexible Spending Accounts
  • Transit and Parking Commuter Benefits
  • Short-Term & Long-Term Disability
  • Tuition Reimbursement, Personal Development, Certifications & Learning Opportunities
  • Employee Referral Program
  • Corporate Sponsored Events & Community Outreach
  • Care.com annual membership
  • Employee Assistance Program
  • Supplemental Benefits via Corestream (Critical Care, Hospital Indemnity, Accident Insurance, Legal Assistance and ID theft protection, etc.)
  • Position may be eligible for a discretionary variable incentive bonus
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service