Data Engineer (m/f/d)

Noxtua
Remote

About The Position

As a Data Engineer (f/d/m), you will play a key role in our Data Expansion Squad, which is responsible for integrating and operationalizing legal data from multiple jurisdictions. The team transforms heterogeneous source data into a unified, high-quality foundation that powers search, retrieval, and AI-supported workflows across our products. You will work closely with AI, engineering, and legal domain experts to adapt and extend existing data workflows for new customer datasets and source formats. Your work will focus on understanding source structures, defining robust mappings, standardizing and enriching content, and ensuring that data is integrated in a way that is reliable, scalable, and easy to use in downstream systems. The Data Expansion Team provides the data foundation, structure, and metadata needed for our agent-based systems to retrieve relevant legal information efficiently and reliably across jurisdictions.

Requirements

  • At least 2 years of professional experience in data engineering, and being involved in successfully deployed projects
  • Strong Python skills with experience in designing robust data pipelines
  • Experience in building and maintaining reliable ETL and RAG pipelines
  • Solid understanding of data modeling, quality, filtering, validation, and consistency
  • Familiarity with containerization (Docker), CI/CD pipelines, and version control (Git)
  • Strong grasp of data structures, algorithms, system design principles, and software engineering best practices
  • English proficiency at the C2 level

Nice To Haves

  • Expertise in working with graph databases
  • Familiarity with developing and deploying NLP models

Responsibilities

  • Design, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, including cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases
  • Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform XML structures into scalable internal schemas and unified document formats
  • Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data
  • Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates
  • Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases.
  • Build and maintain a high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently
  • Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions
  • Own the data integration of one jurisdiction end-to-end

Benefits

  • 100% remote work possible (given a German residence), other countries upon request
  • Flexible working hours
  • 26 days + December 24th & 31st off
  • + 1 additional vacation day per year of employment (up to 30 days)
  • Urban Sports Club Membership, depending on location
  • Laptop (Lenovo or Mac)
  • €1,000 net home office setup budget (paid with your first salary)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service