Data Engineer

SunsetNew York, NY
$180,000 - $280,000Onsite

About The Position

Sunset is building the data layer for real-world AI training. We work with frontier labs to turn messy, multi-modal enterprise data into the highest-quality training data on the market — sourced from the hundreds of venture-backed startups we've helped wind down. We're a fast-growing team based in-person in Dumbo, Brooklyn. Backed by Floodgate, Afore Capital, Hustle Fund, and incredible entrepreneurs. As a Data Engineer at Sunset, you'll own the pipeline that turns raw, chaotic enterprise data into the highest-quality training data on the market. One of our core technical problems is entity resolution and de-identification across different sources and modalities. The deeper challenege is understanding the node structures and linkages well enough to effectively reconstruct the business world this data comes from.

Requirements

  • You are a product minded engineer and have shipped data pipelines at scale
  • You have strong Python and are comfortable across NER, record linkage, and coreference
  • You want to own a hard, ambiguous problem end-to-end rather than wait for a PRD
  • AI is deeply integrated into your workflow and life

Responsibilities

  • Extending our entity resolution pipeline to handle new modalities — think audio transcripts, design files, or embedded references inside PDF contracts
  • Building coreference resolution across Slack threads, email chains, and Linear comments so that "me," "him," and first-name mentions all resolve to the right canonical entity
  • Designing the de-identification layer that replaces PII with stable pseudonyms while preserving every relationship across every source
  • Figuring out how to ingest formats we've never seen before, fast

Benefits

  • 100% covered medical, dental, and vision
  • Unlimited PTO
  • $500 in-office setup allowance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service