Senior Data Engineer

Cobalt Identity Systems•New York City, NY

5d•Onsite

About The Position

Cobalt ID is building the business identity infrastructure for the financial internet. While others focus on consumers, we separate real companies from synthetic ones. With AI accelerating fraud rings and shell companies at global scale, distinguishing a legitimate business from a sophisticated fraudster is now one of the hardest problems in fintech. We’re mapping 100M+ businesses and counting to expose hidden financial crime networks and ensure real businesses are never left out of the financial ecosystem. As a Senior Data Engineer, you'll own the data layer that makes everything else possible. You'll build the ingestion pipelines, entity resolution systems, and data quality infrastructure that connects raw source data to a unified view of every entity in our graph in a manner that’s fast, accurate, and explainable for compliance. The problems here are specific and mostly unsolved by the industry. Similar infrastructure powers leading social media platforms, search engines, and data fusion platforms, but hasn't yet been applied to this problem. If you're energized by turning chaos into structure at massive scale, this role is for you.

Requirements

4+ years building production data pipelines and infrastructure (we care more about skill and impact than years alone)
Experience with large-scale data processing. You've built ETL/ELT systems that handled messy, real-world data at meaningful volume
Hands-on experience with entity resolution, record linkage, or data deduplication. You understand the algorithmic and practical challenges of matching records across noisy sources
Strong fundamentals in data modeling and pipeline orchestration
Comfort with ambiguity and fast iteration in an early-stage environment
You care about data quality as a first-class engineering problem, not an afterthought
You want to be close to the problem and the customer, not siloed from product decisions

Nice To Haves

Experience ingesting and normalizing data across unstructured / semi-structured sources
Background in knowledge graph construction, graph databases, or large-scale entity graph systems
Experience with NLP or LLM-based approaches to entity resolution or document extraction
Background in fraud detection, identity systems, ads ranking, recommendation systems, or other domains that require profiling and linking entities at scale
Familiarity with data infrastructure on cloud platforms at production scale

Responsibilities

Design and build production data pipelines that ingest, normalize, and link data from hundreds of heterogeneous sources
Build and maintain data quality infrastructure: monitoring, validation, deduplication, and freshness tracking across millions of data points
Develop the ingestion and processing layer for unstructured and semi-structured data, including document parsing and extraction from inconsistent sources
Instrument and monitor pipeline health, data coverage, and entity resolution accuracy as the system scales
Ship to production constantly - we're a small team and everything you build matters
Collaborate directly with founders and customers to shape what we build next

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume