AI Information Scientist

Bilue

11d•Remote

About The Position

AI systems don’t fail because of bad models. They fail because of bad libraries, outdated documents cited as current, knowledge that exists but can’t be found, datasets that are technically present but practically untrustworthy. The AI Information Scientist owns that problem. This is not a traditional data engineering role. It sits at the intersection of data governance, knowledge management, and AI delivery. You will design and maintain the catalogues, metadata schemas, quality frameworks, and lineage structures that every AI system in The Foundry depends on, and you will work directly alongside AI Engineers to connect retrieval systems to the clean, context-aware knowledge stores you’ve built. Think of it this way: the AI Engineer builds the system that searches the library. You build the library.

Requirements

A background in data governance, information management, knowledge management, or records management — with genuine interest in how that work enables AI delivery.
Hands-on experience with at least one data catalogue platform (DataHub, OpenMetadata, Collibra, Alation, Apache Atlas, or a major cloud equivalent) and familiarity with metadata standards such as JSON-LD, Dublin Core, or domain-specific ontologies.
Strong SQL skills; working Python for data profiling, quality scripting, and metadata automation; and comfort with dbt or OpenLineage for lineage tracking.
Understanding of vector databases and RAG architecture — enough to know how metadata quality directly affects retrieval precision.
Experience in regulated or high-stakes data environments where provenance and auditability genuinely matter: financial services, insurance, government, healthcare, or similar.
A collaborative, low-ego disposition. The Data Librarian’s work is structural and often invisible. The glory goes to the AI system. You are fine with that.

Nice To Haves

A formal background in library or information science.
Experience with knowledge graphs (Neo4j, RDF, SPARQL).
Prior consulting or agency delivery experience.

Responsibilities

Design and maintain data catalogues for AI projects using platforms such as DataHub, OpenMetadata, Apache Atlas, Collibra, or cloud-native equivalents (AWS Glue Data Catalog, Azure Purview, GCP Dataplex).
Define metadata schemas and taxonomy standards — type, version, jurisdiction, validity period, confidence tier — so retrieval systems know not just what a document is, but when it applies and how much to trust it.
Assess data quality across client and internal assets using tools like Great Expectations, dbt tests, or Soda; flag stale, superseded, or ambiguous records before they reach the AI layer.
Build and maintain data lineage so every AI-generated output can be traced back to its source, version, and validity period — making outputs auditable, not just accurate.
Design automated ingestion workflows and nightly quality checks that keep catalogues current without constant manual intervention.
Partner with The Foundry engineers to connect RAG pipelines and agentic retrieval systems to catalogue APIs, and with The Labs strategists to map client data estates before a project begins.