Data Engineer (Founding Team)

FabrionSan Francisco Bay Area, CA
3h

About The Position

We’re building a multi-tenant, AI-native platform where enterprise data becomes actionable through semantic enrichment, intelligent agents, and governed interoperability. At the heart of this architecture lies our Data Fabric — an intelligent, governed layer that turns fragmented and siloed data into a connected ontology ready for model training, vector search, and insight-to-action workflows. We're looking for engineers who enjoy hard data problems at scale : messy unstructured data, schema drift, multi-source joins, security models, and AI-ready semantic enrichment. You’ll build the backend systems, data pipelines, connector frameworks, and graph-based knowledge models that fuel agentic applications. If you've worked on streaming unstructured pipelines, built connectors into ugly legacy systems, or mapped knowledge graphs that scale — this role will feel like home.

Requirements

  • 5+ years building large-scale data infrastructure in production environments
  • Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)
  • Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs
  • Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)
  • Strong background in knowledge graphs or semantic modeling (e.g. Neo4j, RDF, Gremlin, Puppygraph)
  • Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers
  • Experience implementing data governance : RBAC, ABAC, data contracts, lineage, data quality checks
  • You’re a system thinker: you want to model the real world, not just process it
  • Comfortable navigating ambiguous data models and building from scratch
  • Passionate about enabling AI systems with real-world, messy enterprise data
  • Pragmatic about scalability, observability, and schema evolution
  • Value autonomy, high trust, and meaningful ownership over infrastructure

Nice To Haves

  • Prior work with vector DBs (e.g. Weaviate, Qdrant, Pinecone) and embedding pipelines
  • Experience building or contributing to enterprise connector ecosystems
  • Knowledge of ontology versioning , graph diffing , or semantic schema alignment
  • Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)
  • Familiar with fine-tuning LLMs or enabling RAG pipelines using enterprise knowledge
  • Experience enforcing data access policy with tools like OPA , Keycloak , Snowflake row-level security

Responsibilities

  • Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources
  • Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)
  • Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships
  • Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting
  • Create and manage data contracts, access layers, lineage, and governance mechanisms
  • Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data
  • Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service