Sr. Data Engineer (Contract to Perm)

Locality•New York, NY

6d•$75 - $100•Onsite

About The Position

Locality is seeking a Senior Data Engineer who will serve as the primary internal owner of our Audience Engine, spanning data ingestion, identity resolution, automation, and AI enablement. This role emphasizes hands-on platform stewardship today, with a clear path to supporting future expansion of the HH Identity Graph and applied AI use cases. The Senior Data Engineer will establish and own the technical management of the Audience Engine, transitioning knowledge from external partners and building a scalable, internally managed data platform. The goal is to reduce reliance on third-party vendors while maintaining high standards for platform reliability, uptime, and data integrity across all pipelines and integrations. This role involves architecting, owning, and continuously optimizing Databricks Bronze → Silver → Gold pipelines to support large-scale audience ingestion, enrichment, and activation workflows, ensuring the platform is scalable and cost-efficient through effective pipeline optimization and resource management. The engineer will proactively monitor, troubleshoot, and resolve ingestion failures, schema drift, and data quality issues across batch and near real-time data pipelines. A key responsibility is leading identity resolution across datasets including HHID, DSID, MAID, IP, and campaign logs, improving matching logic to strengthen the Household Identity Graph. The role guarantees accurate and timely audience publication by maintaining robust downstream data delivery processes and SLAs across activation platforms and partners. It also involves owning and maintaining integrations with external data and activation partners (e.g., Experian, FreeWheel, DSPs), ensuring seamless data exchange and operational reliability. The engineer will partner cross-functionally with Product, Analytics, and Ad Operations teams to onboard new data sources and activation endpoints, enabling new use cases and revenue opportunities. Additionally, the role supports and operationalizes applied AI/ML workflows within Databricks, including integration with tools like Akkio, and translates business needs into production-ready data features and datasets. The engineer will design and implement scalable data models and query-ready datasets that power BI reporting, audience insights, and predictive modeling, ensuring alignment with downstream analytics requirements. Finally, the role enforces data governance, lineage, and documentation standards while ensuring compliance with privacy, consent, and data usage regulations across all audience and identity data systems.

Requirements

6+ years of experience as a Data Engineer or Data Platform Engineer, ideally within advertising
Strong hands-on experience with Databricks, Spark, and modern cloud data platforms
Proven experience designing and maintaining multi-layered data architectures (Bronze / Silver / Gold)
Deep expertise in SQL and proficiency in Python or another scripting language
Experience working with identity graphs, audience data, and complex join logic across large datasets
Hands-on experience building and managing automated data pipelines and third-party data integrations
Familiarity with adtech, martech, or audience activation platforms, including exposure to identity providers such as Experian or similar vendors
Practical experience supporting AI/ML workflows or AI-enabled analytics, with an understanding of data quality, schema design, and performance optimization in cloud-scale environments

Responsibilities

Establish and own technical management of the Audience Engine, transitioning knowledge from external partners and building a scalable, internally managed data platform
Reduce reliance on third-party vendors while maintaining high standards for platform reliability, uptime, and data integrity across all pipelines and integrations
Architect, own, and continuously optimize Databricks Bronze → Silver → Gold pipelines to support large scale audience ingestion, enrichment, and activation workflows
Ensure the platform is scalable and cost efficient through effective pipeline optimization and resource management
Proactively monitor, troubleshoot, and resolve ingestion failures, schema drift, and data quality issues across batch and near real time data pipelines
Lead identity resolution across datasets including HHID, DSID, MAID, IP, and campaign logs, improving matching logic to strengthen the Household Identity Graph
Guarantee accurate and timely audience publication by maintaining robust downstream data delivery processes and SLAs across activation platforms and partners
Own and maintain integrations with external data and activation partners (e.g., Experian, FreeWheel, DSPs), ensuring seamless data exchange and operational reliability
Partner cross functionally with Product, Analytics, and Ad Operations teams to onboard new data sources and activation endpoints, enabling new use cases and revenue opportunities
Support and operationalize applied AI/ML workflows within Databricks, including integration with tools like Akkio, and translate business needs into production-ready data features and datasets
Design and implement scalable data models and query-ready datasets that power BI reporting, audience insights, and predictive modeling, ensuring alignment with downstream analytics requirements
Enforce data governance, lineage, and documentation standards while ensuring compliance with privacy, consent, and data usage regulations across all audience and identity data systems