Staff Data Engineer

Imagine Pediatrics

About The Position

Imagine Pediatrics is a tech-enabled, pediatrician-led medical group focused on reimagining care for children with special health care needs. They provide 24/7 virtual-first and in-home medical, behavioral, and social care, collaborating with families, providers, and health plans to overcome barriers to quality care. They enhance existing care teams by offering an additional layer of support with compassion, creativity, and a strong commitment to children with medical complexity. As a Staff Data Engineer, you will be the inaugural dedicated Data Engineer on a hybrid team that includes Analytics Engineers. Your primary responsibilities will involve defining data flow within the platform and managing the data pipelines that support clinical analytics, operational reporting, and external integrations. You will ensure that data ingestion and integration decisions are made with a clear understanding of their impact on downstream analytical usage, including data freshness, granularity, and structure. This role requires close collaboration with Analytics Engineers, Product Engineers, and Platform Engineers to develop a robust platform suitable for a high-growth, mission-driven healthcare organization. The ideal candidate will possess strong technical expertise in data engineering, be comfortable navigating ambiguous problem spaces, and have the ability to influence across various engineering and product teams. You should be driven by curiosity, committed to improving pediatric healthcare, and thrive in a fast-paced startup environment, bringing both engineering depth and a collaborative mindset. You will enjoy end-to-end ownership of systems, thinking holistically rather than just about pipelines, and be adept at working across infrastructure, data movement, and analytics workflows. Proactive and collaborative, you are not afraid to challenge unclear assumptions, communicate trade-offs effectively, and guide teams toward better technical decisions.

Requirements

7–10+ years of data engineering or platform engineering experience, including at least 2+ years in a senior or staff-level role owning production data systems.
Strong experience designing data pipelines using Python and SQL.
Strong experience with AWS services including Lambda, SQS, SNS, and S3.
Strong experience building event-driven and API-based ingestion systems (e.g., webhooks, asynchronous processing, or CDC patterns).
Experience with data orchestration tools such as Dagster (or similar).
Experience working with infrastructure-as-code (Terraform), primarily extending and adapting existing modules and patterns.
Experience with cloud data warehouses, preferably Snowflake, including performance-aware SQL development.
Proficiency in at least one scripting language beyond SQL and Python (JavaScript, TypeScript, or Go) for automation, tooling, or serverless functions.
Demonstrated use of modern software engineering practices including version control, CI/CD, testing, and code review.
Proven ability to troubleshoot complex data and infrastructure issues across multiple systems and clearly communicate findings to both technical and non-technical stakeholders.
Proven ability to reason about downstream analytical impact of data pipeline design, including data freshness, grain, and transformation behavior.
Experience working closely with analytics engineering, data modeling, or similar downstream consumers of data.

Nice To Haves

Experience designing or managing IAM policies and least-privilege access models across data platform services.
Experience with dbt or modern analytics engineering workflows.
Experience working with healthcare data, FHIR resources, or clinical systems.
Familiarity with HIPAA compliance and handling of PHI in cloud environments.
Experience with high-volume ingestion systems including webhook-based tools (e.g., Hevo, Fivetran, or similar).
Experience driving the adoption of AI tools to improve engineering productivity.
Exposure to real-world evidence (RWE), health economics and outcomes research (HEOR), or similar evidence-generation programs.

Responsibilities

Design, build, and maintain scalable ELT pipelines that ingest data from clinical systems, APIs, and third-party integrations utilizing webhook-based, API-based, and CDC (change data capture) approaches.
Architect and manage event-driven data pipelines in AWS — including cross-account configurations and dead-letter queue handling.
Write and maintain infrastructure-as-code to deploy and manage data ingestion workloads, primarily extending existing modules and patterns.
Orchestrate pipeline execution and monitoring using Dagster, ensuring observability and reliability across all workflows.
Implement data quality checks, alerting, and lineage tracking across the pipeline.
Identify and eliminate systemic failure modes in pipelines, improving reliability through long-term fixes rather than repeated incident remediation.
Partner with Analytics Engineers to ensure upstream data supports correct and consistent downstream models.
Set technical direction for data architecture and mentor other engineers.