Senior AI Engineer, Agentic Data Enrichment

Baselayer•San Francisco, CA

5d•$135,000 - $220,000•Hybrid

About The Position

Baselayer answers questions the loan application didn't ask. For every business that crosses our queues, we need to know things that aren't on the form: what the business actually does, where it actually lives on the web, whether the people it names match the public record, and whether anything across the open web contradicts the story we were told. We answer those questions with LLM-driven agents that crawl, click, search, and extract structured evidence from across the web - and we treat this as a production data pipeline, not a research demo. We're hiring a Senior AI Engineer to own a slice of this enrichment surface end-to-end.

Requirements

Shipped LLM-driven agents to production - not notebooks, not demos. Real users, real cost, real failure modes, real on-call.
Strong async Python including structured-data libraries, modern web frameworks, and relational databases.
Experience across multiple frontier LLM providers and at least one agent framework, with deep knowledge of failure modes.
Built or maintained eval methodology: curated golden datasets, scoring functions, labelling guidelines, regression diagnostics.
Browser automation experience: headless browsers, anti-bot evasion, authenticated flows.
Holds informed opinions on structured-output reliability - when to use JSON-schema mode vs. function calling vs. extractor-on-top-of-text.

Nice To Haves

Web scraping at scale: anti-bot evasion, residential proxies, request fingerprinting, authenticated flows, CDN defeats.
Eval-framework experience (e.g., LangSmith, Braintrust, Evals, or custom).
Entity resolution / record linkage / fuzzy matching at scale.
Browser-automation experience at the devtools-protocol level.
Built a tool registry or toolset abstraction over multiple LLM providers.
Cost/latency optimization: response caching, semantic caching, model routing (cheap-first then escalate), thinking-budget tuning, prompt-cache hit-rate work.

Responsibilities

Own industry/category classification of businesses from heterogeneous signals (name, website, directory presence, reviews).
Build and maintain discovery and verification systems for a business's real web presence - filtering aggregators, parked domains, brand collisions, and impersonators.
Link individuals to businesses via public web evidence (e.g. confirming a named officer or employee genuinely works there).
Develop risk/legitimacy scoring derived from web-presence signals, fed back into downstream underwriting.
Build and evolve the shared agent infrastructure: provider-agnostic base agents, shared toolset registry (browser navigation, search, scraping, structured database lookups, scoring), eval harness, and instrumentation surface for token-and-tool tracing.
Own model selection, agent design, prompt and tool engineering, eval methodology, and cost control across your enrichment surface.