AI Data Architect (Remote)

Great Day Improvements: A Family of Brands•Twinsburg, OH

6d•$155,000 - $165,000•Remote

About The Position

This role is foundational to scaling trusted AI across the enterprise. The AI Data Architect owns how enterprise data is structured, enriched, and retrieved to power AI-driven decisioning across Great Day Improvements. This role focuses on building scalable Retrieval-Augmented Generation (RAG) systems that ensure AI outputs are accurate, consistent, and grounded in authoritative data. The AI Data Architect defines chunking strategies, metadata schemas, hybrid retrieval logic, and authority ranking models that form the foundation of trustworthy AI at Great Day Improvements. The ideal candidate will have deep experience in data architecture, semantic search, and knowledge systems, with hands-on expertise in vector databases, embedding models, and retrieval pipeline design. They will lead the translation of business domain knowledge, such as HR policies, call center procedures, and operational workflows, into structured data models, taxonomies, and retrieval logic that AI systems can reliably interpret. This role establishes and enforces the data architecture standards that engineering, product, and operations teams rely on, ensuring every AI interaction is well-governed, explainable, and continuously improving. This role requires a systems thinker who is self-motivated, comfortable navigating ambiguity, and energized by the challenge of making enterprise data truly AI-ready. The ideal candidate will embrace emerging technologies including knowledge graphs, agentic AI architectures, and LLM Ops observability tooling, and will be enthusiastic about building the data foundations that enable Great Day Improvements to scale AI across its growing portfolio of brands. This role defines and governs the architecture and standards for AI data systems, while partnering with engineering and platform teams responsible for implementation and execution.

Requirements

5+ years of experience in data architecture, search systems, knowledge engineering, or applied AI, with demonstrated experience designing and driving adoption of scalable data or AI architectures across teams or domains
Hands-on experience designing, building, or optimizing RAG systems or semantic search platforms
Strong understanding of vector embeddings, their generation, storage, and limitations across different embedding models
Demonstrated experience with chunking strategies and the tradeoffs between granularity, context preservation, and retrieval quality
Proficiency in hybrid retrieval approaches combining vector similarity, keyword search, and metadata filtering
Experience with reranking techniques and relevance tuning for production retrieval systems
Experience designing metadata schemas, taxonomies, ontologies, or knowledge graphs for enterprise data
Proven ability to work with unstructured enterprise data (documents, PDFs, knowledge bases, transcripts, wikis)
Experience designing and working with vector databases and search platforms (e.g., Pinecone, Weaviate, Qdrant, Elasticsearch, FAISS)
Working knowledge of LLM APIs, prompt engineering, and orchestration patterns, with the ability to evaluate and adapt across frameworks (e.g., LangChain, LlamaIndex)
Familiarity with data pipelines, ETL/ELT processes, and API architecture at a systems design level
Understanding of access control, data security, and compliance considerations in AI-powered data systems

Nice To Haves

Experience with knowledge graph technologies (e.g., Neo4j, RDF/OWL, SPARQL) and GraphRAG architectures
Familiarity with agentic AI frameworks (e.g., LangGraph, CrewAI, AutoGen) and multi-agent system design
Experience with LLMOps and observability tooling (e.g., LangSmith, Langfuse, RAGAS evaluation frameworks)
Proficiency in Python for data processing, pipeline scripting, and integration tasks
Experience with cloud AI services on AWS, Azure, or GCP (e.g., Amazon Bedrock, Azure AI, Vertex AI)
Background in the home improvement, manufacturing, or direct-to-consumer industry
Experience with Model Context Protocol (MCP) or similar standards for AI-to-tool interoperability
Master’s degree in Computer Science, Data Science, Information Science, or a related field

Responsibilities

Design end-to-end Retrieval-Augmented Generation (RAG) architecture, including ingestion, chunking, embedding, indexing, retrieval, and response generation
Define chunking strategies based on content type, semantic coherence, and use case requirements
Build metadata schemas, tagging frameworks, and document structures to optimize retrieval precision
Develop hybrid retrieval strategies combining vector similarity, keyword search, metadata filters, and graph-based reasoning
Implement reranking logic and relevance scoring to optimize answer accuracy and grounding
Establish retrieval pipelines that consistently return high-quality, contextually relevant results across enterprise use cases
Own upstream data preparation standards that enable effective retrieval, clearly separating data structuring responsibilities from downstream retrieval and RAG execution
Define standards for document ingestion, cleaning, parsing, and normalization across structured and unstructured enterprise data prior to retrieval
Transform raw enterprise data (PDFs, knowledge bases, policies, call transcripts, wiki pages) into AI-ready formats
Create canonical document structures and semantic representations prior to vectorization
Standardize taxonomy, terminology, and metadata across business domains to ensure consistency at scale
Design and maintain ontologies and knowledge graphs that enrich retrieval context and reduce hallucinations
Define and govern the onboarding, validation, and lifecycle management of enterprise data sources, including approval, updates, and deprecation of content used in AI systems
Define and enforce data quality standards for enterprise content, including completeness, consistency, accuracy, and maintainability of data used in AI systems
Lead the extraction, structuring, and codification of business domain knowledge for AI consumption
Translate business rules into metadata models, labeling strategies, and retrieval logic
Define how different content types (policies, FAQs, procedures, product documentation) are interpreted, prioritized, and surfaced by AI
Define and enforce alignment of AI behavior with real-world business intent, decision logic, and operational workflows
Define source-of-truth hierarchies and authority ranking models across content repositories
Implement version control, document freshness tracking, and conflict resolution strategies for overlapping content
Define access control logic (SSO, role-based access) within retrieval workflows to ensure data security and compliance
Define standards to ensure AI responses are traceable, explainable, and grounded in authoritative, auditable sources
Define data lineage and provenance tracking standards to support governance and regulatory requirements
Drive adoption of AI data architecture standards across engineering, product, and business teams, ensuring compliance with defined data, retrieval, and governance models
Define and govern where and how LLMs are used across the pipeline (classification, routing, summarization, answering)
Balance cost, latency, and performance across model usage, including token optimization strategies
Define and enforce query routing strategies based on user intent (policy lookup, FAQ, transactional, analytical)
Own and optimize orchestration between retrieval systems, LLMs, and agentic AI workflows
Own evaluation and integration of emerging orchestration frameworks and Model Context Protocol (MCP) standards
Own evaluation frameworks for retrieval accuracy, answer quality, and grounding using tools such as RAGAS, LangSmith, or Langfuse
Own the creation and maintenance of domain-specific test sets across business areas (HR, call center, operations, product knowledge)
Own analysis of failure cases and continuous improvement of retrieval strategies, chunking approaches, and data structuring
Define measurable performance standards for precision, recall, grounding, consistency, and latency
Own observability and monitoring pipelines to track retrieval and LLM performance in production