AI Data Engineering Lead

Ideagen
Remote

About The Position

Mazlan+ Models is Ideagen’s programme to build domain‑specific AI models for regulated industries, where the quality of data directly determines the quality of outcomes. As AI Data Engineering Lead, you will own the data foundation every model is built on and shape how trusted AI is delivered at scale. This is a leadership role combining strategy, architecture, and governance with hands‑on impact across sourcing, transforming, versioning, and preparing high‑value regulated data for training. You will lead a growing team of data engineers and work closely with AI engineering, legal, and domain experts to ensure our models are accurate, compliant, and ready for real‑world use.

Requirements

  • You are a senior data engineer or technical lead with prior experience leading teams and owning large data platforms end to end
  • You have deep production experience with Python and SQL and write data transformation code that is robust, readable, and reusable
  • You have designed and run AWS data stacks at scale, including services such as S3, Glue, Athena, Kinesis, Lambda, and IAM
  • You understand ML training data pipelines and know how they differ from analytics workloads, including dataset formats, splits, and quality constraints
  • You bring strong data governance instincts and design for versioning, lineage, and auditability from day one
  • You are comfortable working with legal and compliance partners on sensitive data handling and regulatory requirements
  • You communicate clearly across disciplines and work effectively with AI engineers, product leaders, and domain specialists

Nice To Haves

  • Experience with NLP or LLM training data, data version control tools, or regulated industry software is valuable but not essential

Responsibilities

  • Leading and developing a team of AI data engineers, setting clear technical standards, supporting career growth, and scaling the function as the programme grows
  • Defining the technical direction for AI data engineering, including architecture decisions, tooling choices, and delivery practices across the organisation
  • Designing and building the end‑to‑end AI data platform, from operational product data and regulatory sources through cloud storage and transformation pipelines to training‑ready datasets
  • Owning dataset versioning and lineage so every training artefact is traceable, reproducible, and auditable across the full model lifecycle
  • Building and maintaining large‑scale regulatory and operational corpora in collaboration with domain experts, ensuring data quality and consistency
  • Architecting and operating AWS‑based data infrastructure at production scale with a focus on reliability, security, and performance
  • Defining and enforcing data governance standards, including quality checks, labelling conventions, and data handling frameworks
  • Leading GDPR compliance for AI training data in partnership with Legal and ensuring best practice is embedded from the start

Benefits

  • Benefits at Ideagen
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service