About The Position

This role focuses on building and optimizing large-scale data pipelines and search experiences within the healthcare data domain. The Senior Software Engineer will collaborate with various teams to translate requirements into robust data solutions, design and build batch and streaming pipelines, develop data models and backend services, optimize data and search performance, drive engineering excellence, and pioneer new technologies in data engineering and information retrieval.

Requirements

  • Proven experience designing and orchestrating large-scale ETL/ELT pipelines using Apache Beam/Google Cloud Dataflow (or similar), and DBT, built on modern cloud data warehouses.
  • 4+ years of experience working with relational databases and analytical data warehouses, with deep, advanced SQL skills and solid data-modeling fundamentals (e.g., dimensional and normalized modeling).
  • Working experience with search indexing and Elasticsearch, including index management, mappings, and building and maintaining search indices from pipeline output.
  • Experience building scalable Python services and high-performance data APIs, including developing Model Context Protocol (MCP) servers that expose data and tooling to downstream and AI consumers.
  • Strong understanding of containerization (Docker), CI/CD methodologies (e.g., GitHub Actions), Git, Infrastructure as Code (e.g., Terraform/Pulumi), and managing services within cloud platforms (3+ years of GCP experience preferred).
  • Bachelor’s Degree
  • 5+ years of professional experience with Python, with strong software-engineering fundamentals (testing, code review, design).
  • 3+ years experience with Java or another JVM language is also high desired, particularly for Beam/Dataflow.

Nice To Haves

  • BigQuery experience is a plus.
  • Familiarity with hybrid (BM25 + semantic/vector) search is a plus.
  • Familiarity with healthcare data standards (e.g., NPPES/NPI registries, NUCC Provider Taxonomy, machine-readable files (MRFs) for cost transparency, and FHIR).
  • Experience with data quality and pipeline testing frameworks (e.g., dbt tests, Great Expectations) and streaming/event ingestion (e.g., Pub/Sub, Kafka).
  • Experience integrating graph-based data and healthcare taxonomy ontologies to enrich datasets and search query context.
  • Experience with observability and logging platforms (e.g., DataDog) for monitoring pipeline health and data freshness.

Responsibilities

  • Partner with product managers, data engineers, and business leaders to translate complex product and data requirements into scalable, reliable data pipelines and the search experiences they power.
  • Design, build, and optimize large-scale distributed batch and streaming pipelines (using Apache Airflow, Apache Beam/Dataflow, and DBT BigQuery) to ingest, model, and transform high-volume healthcare data into clean, well-tested, query-ready datasets and search indices.
  • Develop resilient Python services and DBT models that power data delivery and self-service analytics, including Model Context Protocol (MCP) servers that expose curated data and tooling to downstream and AI consumers, and integrate with external REST/SOAP APIs and third-party data sources.
  • Deeply tune pipeline throughput, data warehouse performance, and search indexing — optimizing BigQuery cost and query performance and Elasticsearch index design to ensure data freshness, relevance, and scalability across high-volume datasets.
  • Write clean, maintainable, well-tested code and lead by example through rigorous code reviews, architectural and data-modeling design discussions, and mentoring, driving a culture of high-quality software and trustworthy data.
  • Stay at the forefront of modern data engineering, the analytics-engineering ecosystem (e.g., DBT, BigQuery), and information retrieval, proactively applying these advancements to strengthen our data platform and the products it powers.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service