Senior Data Engineer

Cobalt Identity SystemsNew York City, NY
Onsite

About The Position

Cobalt ID is building the business identity infrastructure for the financial internet. While others focus on consumers, we separate real companies from synthetic ones. With AI accelerating fraud rings and shell companies at global scale, distinguishing a legitimate business from a sophisticated fraudster is now one of the hardest problems in fintech. We’re mapping 100M+ businesses and counting to expose hidden financial crime networks and ensure real businesses are never left out of the financial ecosystem. As a Senior Data Engineer, you'll own the data layer that makes everything else possible. You'll build the ingestion pipelines, entity resolution systems, and data quality infrastructure that connects raw source data to a unified view of every entity in our graph in a manner that’s fast, accurate, and explainable for compliance. The problems here are specific and mostly unsolved by the industry. Similar infrastructure powers leading social media platforms, search engines, and data fusion platforms, but hasn't yet been applied to this problem. If you're energized by turning chaos into structure at massive scale, this role is for you.

Requirements

  • 4+ years building production data pipelines and infrastructure (we care more about skill and impact than years alone)
  • Experience with large-scale data processing. You've built ETL/ELT systems that handled messy, real-world data at meaningful volume
  • Hands-on experience with entity resolution, record linkage, or data deduplication. You understand the algorithmic and practical challenges of matching records across noisy sources
  • Strong fundamentals in data modeling and pipeline orchestration
  • Comfort with ambiguity and fast iteration in an early-stage environment
  • You care about data quality as a first-class engineering problem, not an afterthought
  • You want to be close to the problem and the customer, not siloed from product decisions

Nice To Haves

  • Experience ingesting and normalizing data across unstructured / semi-structured sources
  • Background in knowledge graph construction, graph databases, or large-scale entity graph systems
  • Experience with NLP or LLM-based approaches to entity resolution or document extraction
  • Background in fraud detection, identity systems, ads ranking, recommendation systems, or other domains that require profiling and linking entities at scale
  • Familiarity with data infrastructure on cloud platforms at production scale

Responsibilities

  • Design and build production data pipelines that ingest, normalize, and link data from hundreds of heterogeneous sources
  • Build and maintain data quality infrastructure: monitoring, validation, deduplication, and freshness tracking across millions of data points
  • Develop the ingestion and processing layer for unstructured and semi-structured data, including document parsing and extraction from inconsistent sources
  • Instrument and monitor pipeline health, data coverage, and entity resolution accuracy as the system scales
  • Ship to production constantly - we're a small team and everything you build matters
  • Collaborate directly with founders and customers to shape what we build next
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service