Data Scientist

Scrunch•New York, NY

12d•Hybrid

About The Position

We’re looking for a Data Scientist to help us build, measure, improve AI-augmented information retrieval and web visibility, spanning retrieval & ranking, NLP, experimentation, and knowledge graph / semantic web systems. You’ll partner closely with Engineering, Product, and Marketing to turn ambiguous questions into measurable work and shippable features. Location: NYC | Hybrid | 3x/week

Requirements

Strong foundations in statistics, experimental design, and model evaluation.
Hands-on experience with information retrieval / ranking / search relevance, ideally in AI-augmented contexts.
Experience building and evaluating NLP models in production or research settings
Proficiency in Python and SQL
Strong communication: you can explain what you did, why it matters, and how confident you are, without hand-waving.
Deep comfort with the web as a system (crawl/index realities, domains, content structure, measurement constraints).
Experience working with non-representative data and making results more trustworthy through sampling-aware analysis (bias checks, adjustments, uncertainty).
Work with noisy/partial labels, long-tailed queries, drifting content, and evaluation that mixes offline + online signals

Nice To Haves

Reinforcement learning / control theory / optimal control applied to ranking, allocation, or policy optimization.
Semantic Web / Knowledge Graph tooling (RDF/OWL concepts, graph DBs, SPARQL, entity resolution).
SEO + AEO familiarity and the ability to connect technical visibility drivers to business outcomes.
Marketing analytics experience (attribution-adjacent thinking, funnel measurement, incrementality).
Publications in top-tier venues (e.g., SIGIR and related IR/NLP conferences) or equivalent demonstrated research depth.

Responsibilities

Design, prototype, and productionize models/algorithms for retrieval, ranking, and relevance quality across web and AI-assisted surfaces.
Build NLP pipelines (classification, entity extraction, topic modeling, sentiment analysis) and validate them with clear offline + online metrics.
Own measurement and experimentation: hypotheses, experiment design, guardrails, and readouts that drive decisions.
Develop simulation / modeling frameworks to forecast outcomes, test policies, and stress-test system behavior under different assumptions.
Contribute to knowledge graph / semantic web work: schema design, entity resolution, and downstream ML / GenAI applications.
Translate technical work into crisp narratives for stakeholders (product tradeoffs, confidence, limitations, and next steps).
Contribute to external thought leadership where it makes sense (blog posts, talks, papers).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume