AI Data Engineer

Giesecke+Devrient•Montreal, QC

4d•CA$95,000 - CA$115,000

About The Position

Giesecke+Devrient is a globally leading SecurityTech company seeking a technical and execution-focused Data Engineer to join its new AI Hub. The ideal candidate will combine hands-on experience in data engineering for AI systems with strong Python, SQL, and data pipeline engineering capabilities. This role will support both AI engineering initiatives and machine learning projects by making enterprise data reliable, accessible, well-structured, and ready for production use. This role is focused on data engineering for Generative AI, RAG, document ingestion, vector search, knowledge graphs, and machine learning workflows, including data preparation, data quality, feature engineering, and reusable data assets for AI solutions.

Requirements

Three (3)+ years of hands-on experience in data engineering, analytics engineering, machine learning engineering, or related software/data development roles.
Experience building production-grade data pipelines, ETL/ELT workflows, APIs, data services, or distributed data processing systems.
Experience preparing data for machine learning projects, including data cleaning, feature engineering, dataset creation, and data quality validation.
Strong Python and SQL skills, with practical experience building reliable, maintainable, and testable data pipelines.
Hands-on experience with data engineering tools and frameworks such as Pandas, PySpark, Airflow, Dagster, Prefect, dbt, or similar technologies.
Practical knowledge of document ingestion, document parsing, chunking, embeddings, semantic search, hybrid search, and retrieval pipelines.
Hands-on experience with vector databases and search technologies such as pgvector, Pinecone, Weaviate, Milvus, OpenSearch, Elasticsearch, or similar platforms.
Experience with cloud data platforms, lakehouse patterns, object storage, relational databases, and data warehouse technologies.
Understanding of machine learning workflows, feature engineering, feature stores, model training data requirements, and dataset versioning.
Ability to implement data quality controls, validation tests, lineage, monitoring, access control, and governance-aware data workflows.
Ability to work with technical specifications, data contracts, architecture patterns, and engineering standards.
Strong problem-solving skills and ability to work in a fast-moving, delivery-focused environment.
Bachelor’s degree in Computer Science, Software Engineering, Data Engineering, Artificial Intelligence, Data Science, or related field preferred.

Nice To Haves

Experience with RAG, document processing, embeddings, vector databases, search systems, or knowledge graphs is strongly preferred.
Experience contributing to production-grade systems in enterprise, regulated, or security-sensitive environments is preferred.
Hands-on experience with graph databases or knowledge graph technologies such as Neo4j, RDF, SPARQL, graph data modeling, or entity-relationship extraction is considered an asset.
Experience working in specification-first, contract-driven, or Spec-Driven Development environments is considered an asset.
Master’s degree is considered an asset.

Responsibilities

Design, build, and maintain data pipelines that support AI engineering, RAG, and machine learning initiatives from experimentation through production.
Develop document ingestion and processing pipelines for structured, semi-structured, and unstructured enterprise content, including parsing, cleaning, normalization, metadata extraction, and enrichment.
Implement chunking strategies, embedding pipelines, indexing workflows, and retrieval-ready datasets for RAG and Graph RAG applications.
Build and maintain integrations with vector databases, search indexes, graph databases, data lakes, warehouses, and enterprise source systems.
Support knowledge graph initiatives by preparing entities, relationships, ontologies, metadata, and graph-ready data pipelines.
Prepare and transform data for machine learning projects, including data cleaning, labeling support, feature engineering, feature validation, and dataset versioning.
Implement data quality checks, lineage, observability, monitoring, and automated validation for AI and ML data pipelines.
Collaborate with data scientists, applied AI engineers, platform engineers, security, data governance teams, and business stakeholders to deliver scalable AI solutions.
Contribute to reusable ingestion components, data engineering patterns, technical standards, and best practices for the AI Hub.
Other duties as assigned.