AI Data Architect (Remote)

Great Day Improvements: A Family of BrandsTwinsburg, OH
$155,000 - $165,000Remote

About The Position

This role is foundational to scaling trusted AI across the enterprise. The AI Data Architect owns how enterprise data is structured, enriched, and retrieved to power AI-driven decisioning across Great Day Improvements. This role focuses on building scalable Retrieval-Augmented Generation (RAG) systems that ensure AI outputs are accurate, consistent, and grounded in authoritative data. The AI Data Architect defines chunking strategies, metadata schemas, hybrid retrieval logic, and authority ranking models that form the foundation of trustworthy AI at Great Day Improvements. The ideal candidate will have deep experience in data architecture, semantic search, and knowledge systems, with hands-on expertise in vector databases, embedding models, and retrieval pipeline design. They will lead the translation of business domain knowledge, such as HR policies, call center procedures, and operational workflows, into structured data models, taxonomies, and retrieval logic that AI systems can reliably interpret. This role establishes and enforces the data architecture standards that engineering, product, and operations teams rely on, ensuring every AI interaction is well-governed, explainable, and continuously improving. This role requires a systems thinker who is self-motivated, comfortable navigating ambiguity, and energized by the challenge of making enterprise data truly AI-ready. The ideal candidate will embrace emerging technologies including knowledge graphs, agentic AI architectures, and LLM Ops observability tooling, and will be enthusiastic about building the data foundations that enable Great Day Improvements to scale AI across its growing portfolio of brands. This role defines and governs the architecture and standards for AI data systems, while partnering with engineering and platform teams responsible for implementation and execution.

Requirements

  • 5+ years of experience in data architecture, search systems, knowledge engineering, or applied AI, with demonstrated experience designing and driving adoption of scalable data or AI architectures across teams or domains
  • Hands-on experience designing, building, or optimizing RAG systems or semantic search platforms
  • Strong understanding of vector embeddings, their generation, storage, and limitations across different embedding models
  • Demonstrated experience with chunking strategies and the tradeoffs between granularity, context preservation, and retrieval quality
  • Proficiency in hybrid retrieval approaches combining vector similarity, keyword search, and metadata filtering
  • Experience with reranking techniques and relevance tuning for production retrieval systems
  • Experience designing metadata schemas, taxonomies, ontologies, or knowledge graphs for enterprise data
  • Proven ability to work with unstructured enterprise data (documents, PDFs, knowledge bases, transcripts, wikis)
  • Experience designing and working with vector databases and search platforms (e.g., Pinecone, Weaviate, Qdrant, Elasticsearch, FAISS)
  • Working knowledge of LLM APIs, prompt engineering, and orchestration patterns, with the ability to evaluate and adapt across frameworks (e.g., LangChain, LlamaIndex)
  • Familiarity with data pipelines, ETL/ELT processes, and API architecture at a systems design level
  • Understanding of access control, data security, and compliance considerations in AI-powered data systems

Nice To Haves

  • Experience with knowledge graph technologies (e.g., Neo4j, RDF/OWL, SPARQL) and GraphRAG architectures
  • Familiarity with agentic AI frameworks (e.g., LangGraph, CrewAI, AutoGen) and multi-agent system design
  • Experience with LLMOps and observability tooling (e.g., LangSmith, Langfuse, RAGAS evaluation frameworks)
  • Proficiency in Python for data processing, pipeline scripting, and integration tasks
  • Experience with cloud AI services on AWS, Azure, or GCP (e.g., Amazon Bedrock, Azure AI, Vertex AI)
  • Background in the home improvement, manufacturing, or direct-to-consumer industry
  • Experience with Model Context Protocol (MCP) or similar standards for AI-to-tool interoperability
  • Master’s degree in Computer Science, Data Science, Information Science, or a related field

Responsibilities

  • Design end-to-end Retrieval-Augmented Generation (RAG) architecture, including ingestion, chunking, embedding, indexing, retrieval, and response generation
  • Define chunking strategies based on content type, semantic coherence, and use case requirements
  • Build metadata schemas, tagging frameworks, and document structures to optimize retrieval precision
  • Develop hybrid retrieval strategies combining vector similarity, keyword search, metadata filters, and graph-based reasoning
  • Implement reranking logic and relevance scoring to optimize answer accuracy and grounding
  • Establish retrieval pipelines that consistently return high-quality, contextually relevant results across enterprise use cases
  • Own upstream data preparation standards that enable effective retrieval, clearly separating data structuring responsibilities from downstream retrieval and RAG execution
  • Define standards for document ingestion, cleaning, parsing, and normalization across structured and unstructured enterprise data prior to retrieval
  • Transform raw enterprise data (PDFs, knowledge bases, policies, call transcripts, wiki pages) into AI-ready formats
  • Create canonical document structures and semantic representations prior to vectorization
  • Standardize taxonomy, terminology, and metadata across business domains to ensure consistency at scale
  • Design and maintain ontologies and knowledge graphs that enrich retrieval context and reduce hallucinations
  • Define and govern the onboarding, validation, and lifecycle management of enterprise data sources, including approval, updates, and deprecation of content used in AI systems
  • Define and enforce data quality standards for enterprise content, including completeness, consistency, accuracy, and maintainability of data used in AI systems
  • Lead the extraction, structuring, and codification of business domain knowledge for AI consumption
  • Translate business rules into metadata models, labeling strategies, and retrieval logic
  • Define how different content types (policies, FAQs, procedures, product documentation) are interpreted, prioritized, and surfaced by AI
  • Define and enforce alignment of AI behavior with real-world business intent, decision logic, and operational workflows
  • Define source-of-truth hierarchies and authority ranking models across content repositories
  • Implement version control, document freshness tracking, and conflict resolution strategies for overlapping content
  • Define access control logic (SSO, role-based access) within retrieval workflows to ensure data security and compliance
  • Define standards to ensure AI responses are traceable, explainable, and grounded in authoritative, auditable sources
  • Define data lineage and provenance tracking standards to support governance and regulatory requirements
  • Drive adoption of AI data architecture standards across engineering, product, and business teams, ensuring compliance with defined data, retrieval, and governance models
  • Define and govern where and how LLMs are used across the pipeline (classification, routing, summarization, answering)
  • Balance cost, latency, and performance across model usage, including token optimization strategies
  • Define and enforce query routing strategies based on user intent (policy lookup, FAQ, transactional, analytical)
  • Own and optimize orchestration between retrieval systems, LLMs, and agentic AI workflows
  • Own evaluation and integration of emerging orchestration frameworks and Model Context Protocol (MCP) standards
  • Own evaluation frameworks for retrieval accuracy, answer quality, and grounding using tools such as RAGAS, LangSmith, or Langfuse
  • Own the creation and maintenance of domain-specific test sets across business areas (HR, call center, operations, product knowledge)
  • Own analysis of failure cases and continuous improvement of retrieval strategies, chunking approaches, and data structuring
  • Define measurable performance standards for precision, recall, grounding, consistency, and latency
  • Own observability and monitoring pipelines to track retrieval and LLM performance in production
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service