Principal Data Platform Engineer

Blink HealthNew York, NY
3d

About The Position

We are seeking a Principal Data Platform Engineer to define and evolve our real-time and batch data platform built on AWS and Databricks. This role owns the technical vision for how data is ingested, processed, stored, and served as trusted datasets, metrics, and APIs that power products, decisioning systems, and operational workflows. As a Principal, you are a technical authority and force multiplier—deeply hands-on while setting architectural direction across streaming systems, lakehouse design, and data-serving layers. You will partner closely with engineering, analytics, and product teams to simplify the platform, eliminate legacy patterns, and establish scalable, reliable foundations for real-time analytics.

Requirements

  • Deep expertise in real-time and distributed data systems, including AWS Kinesis and Spark Structured Streaming
  • Strong command of Databricks on AWS (Delta Lake, clusters, jobs) and core AWS services (S3, IAM, networking)
  • Proven experience designing data-serving architectures and APIs for analytics, metrics, and feature consumption
  • Advanced Python and SQL skills for building scalable, high-performance data pipelines
  • Demonstrated ability to design idempotent, replayable, and observable data platforms at scale

Responsibilities

  • Own the end-to-end data platform architecture, spanning streaming ingestion, lakehouse storage, and data/insight serving layers
  • Architect real-time streaming systems using AWS Kinesis and Spark Structured Streaming to support low-latency use cases
  • Design stream-to-lakehouse convergence patterns that unify real-time and historical data with strong correctness guarantees
  • Build and evolve data, metrics, and feature APIs that expose curated datasets for downstream applications and analytics
  • Establish canonical event schemas and data contracts to support event-driven and API-based consumption
  • Make deep technical decisions across AWS infrastructure (Kinesis, S3, IAM, networking) and Databricks internals (clusters, jobs, Delta Lake, performance tuning)
  • Drive platform modernization, retiring legacy tools and patterns in favor of simpler, lakehouse-first designs
  • Set standards for high-performance SQL and Spark workloads, optimizing for cost, latency, and scale
  • Lead complex platform initiatives from architecture through production delivery and ongoing reliability
  • Provide technical leadership and mentorship, shaping best practices for platform design, data quality, and operability
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service