Data Engineer, Knowledge Graphs

MithrlSan Francisco, CA
2dOnsite

About The Position

We are hiring a Data Engineer, Knowledge Graphs to build the infrastructure that powers Mithrl’s biological knowledge layer. You will partner closely with the Data Scientist, Knowledge Graphs to take curated knowledge sources and transform them into scalable, reliable, production ready systems that serve the entire platform. Your work includes building ETL pipelines for large biological datasets, designing schemas and storage models for graph structured data, and creating the API surfaces that allow ML engineers, application teams, and the AI Co-Scientist to query and use the knowledge graph efficiently. You will also own the reliability, performance, and versioning of knowledge graph infrastructure across releases. This role is the bridge between biological knowledge ingestion and the high performance engineering systems that use it. If you enjoy working on data modeling, schema design, graph storage, ETL, and scalable infrastructure, this is an opportunity to have deep impact on the intelligence layer of Mithrl.

Requirements

  • Strong experience as a data engineer or backend engineer working with data intensive systems
  • Experience building ETL or ELT pipelines for large structured or semi structured datasets
  • Strong understanding of database design, schema modeling, and data architecture
  • Experience with graph data models or willingness to learn graph storage concepts
  • Proficiency in Python or similar languages for data engineering
  • Experience designing and maintaining APIs for data access
  • Understanding of versioning, provenance, validation, and reproducibility in data systems
  • Experience with cloud infrastructure and modern data stack tools
  • Strong communication skills and ability to work closely with scientific and engineering teams

Nice To Haves

  • Experience with graph databases or graph query languages
  • Experience with biological or chemical data sources
  • Familiarity with ontologies, controlled vocabularies, and metadata standards
  • Experience with data warehousing and analytical storage formats
  • Previous work in a tech bio company or scientific platform environment

Responsibilities

  • Build and maintain ETL pipelines for large public biological datasets and curated knowledge sources
  • Design, implement, and evolve schemas and storage models for graph structured biological data
  • Create efficient APIs and query surfaces that allow internal teams and AI systems to retrieve nodes, relationships, pathways, annotations, and graph analytics
  • Partner closely with the Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontology mappings
  • Build data models that support multi tenant access, versioning, and reproducibility across releases
  • Implement scalable storage and indexing strategies for high volume graph data
  • Maintain data quality, validate data integrity, and build monitoring around ingestion and usage
  • Work with ML engineers and application teams to ensure the knowledge graph infrastructure supports downstream reasoning, analysis, and discovery applications
  • Support data warehousing, documentation, and API reliability
  • Ensure performance, reliability, and uptime for knowledge graph services

Benefits

  • Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

11-50 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service