Data Engineer (Founding Team)

Fabrion•San Francisco Bay Area, CA

About The Position

We’re building a multi-tenant, AI-native platform where enterprise data becomes actionable through semantic enrichment, intelligent agents, and governed interoperability. At the heart of this architecture lies our Data Fabric — an intelligent, governed layer that turns fragmented and siloed data into a connected ontology ready for model training, vector search, and insight-to-action workflows. We're looking for engineers who enjoy hard data problems at scale : messy unstructured data, schema drift, multi-source joins, security models, and AI-ready semantic enrichment. You’ll build the backend systems, data pipelines, connector frameworks, and graph-based knowledge models that fuel agentic applications. If you've worked on streaming unstructured pipelines, built connectors into ugly legacy systems, or mapped knowledge graphs that scale — this role will feel like home.

Requirements

5+ years building large-scale data infrastructure in production environments
Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)
Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs
Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)
Strong background in knowledge graphs or semantic modeling (e.g. Neo4j, RDF, Gremlin, Puppygraph)
Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers
Experience implementing data governance : RBAC, ABAC, data contracts, lineage, data quality checks
You’re a system thinker: you want to model the real world, not just process it
Comfortable navigating ambiguous data models and building from scratch
Passionate about enabling AI systems with real-world, messy enterprise data
Pragmatic about scalability, observability, and schema evolution
Value autonomy, high trust, and meaningful ownership over infrastructure

Nice To Haves

Prior work with vector DBs (e.g. Weaviate, Qdrant, Pinecone) and embedding pipelines
Experience building or contributing to enterprise connector ecosystems
Knowledge of ontology versioning , graph diffing , or semantic schema alignment
Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)
Familiar with fine-tuning LLMs or enabling RAG pipelines using enterprise knowledge
Experience enforcing data access policy with tools like OPA , Keycloak , Snowflake row-level security

Responsibilities

Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources
Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)
Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships
Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting
Create and manage data contracts, access layers, lineage, and governance mechanisms
Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data
Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume