Data Engineer with Expert-level SQL

CapgeminiNew York, NY
$100,000 - $130,000Onsite

About The Position

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.Onsite : New YorkJob DescriptionKey ResponsibilitiesPipeline Engineering• Design and maintain high-throughput ingestion pipelines for transaction signals, behavioral events, and third-party identity graphs - including LiveRamp RampID, UID2, GCLID chains, and household device graphs• Implement identity resolution logic at scale: deterministic matching, probabilistic graph construction, and household + device-level cluster assembly across 1B+ data points• Build and maintain data clean room connectors and privacy-preserving data exchange pipelines (AWS Clean Rooms, LiveRamp DCR, Google ADH, or equivalent)• Develop integrations between activation platforms (email, CDP, DSP) and the identity graph layer - supporting real-time audience push and match rate monitoringData Modeling & Quality• Design medallion-architecture or equivalent data models optimized for cohort-level LTV/CAC attribution and multi-touch attribution across owned, paid, and clean room channels• Build automated QC and reconciliation frameworks - deduplication, compliance validation, and data lineage tracking - capable of reducing manual reconciliation cycles from weeks to hours• Implement PII governance controls at the pipeline layer: redacted ID egress, consent signal propagation, and guardrail validation aligned to GLBA, Fair Lending, UDAAP, and TCPA/CAN-SPAMPlatform Integration• Integrate LLM-based APIs (e.g., Anthropic Claude, OpenAI, Vertex AI) for AI-powered signal enrichment, audience brief generation, and compliance pre-screening within pipeline workflows• Build serverless microservices and API bridge layers connecting clean room outputs to activation destinations - using any major serverless or edge compute platform• Maintain and evolve authentication, email notification, and managed database services supporting platform-facing APIs and client-facing tooling

Requirements

  • 5+ years of data engineering experience
  • Expert-level SQL across at least one major cloud data warehouse: Snowflake, Google BigQuery, Amazon Redshift, or Azure Synapse
  • Proficiency in Python for pipeline development, transformation logic, and data quality automation
  • Hands-on experience with at least one clean room technology: AWS Clean Rooms, LiveRamp DCR, Google ADH, InfoSum, or equivalent privacy-preserving data collaboration platform
  • Deep understanding of identity resolution concepts: deterministic matching, probabilistic graph construction, household-level aggregation, and device graph assembly
  • Strong PII governance knowledge: data residency, consent frameworks, and financial services regulatory requirements (GLBA, Fair Lending, UDAAP)
  • Experience integrating with DSPs, CDPs, or marketing activation platforms at the data layer
  • Ability to operate in client-facing consulting delivery contexts - translating business requirements into technical specifications

Nice To Haves

  • Experience with graph database technologies - Neo4j, Amazon Neptune, or TigerGraph - for identity graph storage and traversal
  • Familiarity with LiveRamp Embedded Identity, UID2 token handling, or walled garden attribution integrations (Google ADH, Meta CAPI, Amazon Attribution)
  • Working knowledge of LLM APIs for structured data enrichment and AI-assisted pipeline workflows

Responsibilities

  • Design and maintain high-throughput ingestion pipelines for transaction signals, behavioral events, and third-party identity graphs - including LiveRamp RampID, UID2, GCLID chains, and household device graphs
  • Implement identity resolution logic at scale: deterministic matching, probabilistic graph construction, and household + device-level cluster assembly across 1B+ data points
  • Build and maintain data clean room connectors and privacy-preserving data exchange pipelines (AWS Clean Rooms, LiveRamp DCR, Google ADH, or equivalent)
  • Develop integrations between activation platforms (email, CDP, DSP) and the identity graph layer - supporting real-time audience push and match rate monitoring
  • Design medallion-architecture or equivalent data models optimized for cohort-level LTV/CAC attribution and multi-touch attribution across owned, paid, and clean room channels
  • Build automated QC and reconciliation frameworks - deduplication, compliance validation, and data lineage tracking - capable of reducing manual reconciliation cycles from weeks to hours
  • Implement PII governance controls at the pipeline layer: redacted ID egress, consent signal propagation, and guardrail validation aligned to GLBA, Fair Lending, UDAAP, and TCPA/CAN-SPAM
  • Integrate LLM-based APIs (e.g., Anthropic Claude, OpenAI, Vertex AI) for AI-powered signal enrichment, audience brief generation, and compliance pre-screening within pipeline workflows
  • Build serverless microservices and API bridge layers connecting clean room outputs to activation destinations - using any major serverless or edge compute platform
  • Maintain and evolve authentication, email notification, and managed database services supporting platform-facing APIs and client-facing tooling

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service