Senior Data Engineer

Raylu AI•New York City, NY

51d•Onsite

About The Position

We’re hiring a Senior Data Engineer to own data at truly massive scale. You’ll design and run pipelines that clean, enrich, and serve data spanning hundreds of attributes across 80M+ companies and 800M+ people. The role blends classic data engineering with data operations, vendor/BPO orchestration, and data partnerships. Core stack : Python, Dagster, DuckDB Pipelines at scale : Building resilient ELT/ETL with strong contracts, idempotency, and lineage. Data operations : Set quality bars, manage BPO workflows, and run SLAs with external data partners. Serving & access : Position data for production use from serving infrastructure, documentation, and SLAs for internal consumers. Cost & performance : You tune storage/compute and keep a sharp eye on unit economics. Opinionated: Deep level of understanding of the technological landscape, making both high level system and granular code design decisions based on understanding rather than preference - diving deep on unknown patterns in order to build the best product.

Requirements

Python
Dagster
DuckDB
Building resilient ELT/ETL with strong contracts, idempotency, and lineage.
Set quality bars, manage BPO workflows, and run SLAs with external data partners.
Position data for production use from serving infrastructure, documentation, and SLAs for internal consumers.
Tune storage/compute and keep a sharp eye on unit economics.
Deep level of understanding of the technological landscape, making both high level system and granular code design decisions based on understanding rather than preference - diving deep on unknown patterns in order to build the best product.

Nice To Haves

Experience with big data, columnar storage formats, vector indexes, and privacy/compliance in data products.

Responsibilities

Own end-to-end data flows: ingestion, normalization, entity resolution, enrichment, and delivery.
Stand up monitoring for freshness, completeness, and accuracy; drive RCA and prevention.
Build internal tools that make data discoverable and usable by engineering and product.
Recruit, onboard, and manage BPO vendors; negotiate and run data partnerships.