Data Platform Engineer

Trulioo•San Diego, CA

49d•Hybrid

About The Position

Are you ready to embark on a career that truly affects people around the world? Trulioo invites you to be a catalyst for change in the dynamic realm of digital identity verification. As the global front-runner in our industry, we are redefining how businesses grow, innovate and comply online. Picture yourself at the forefront of innovation, contributing to our award-winning platform that enables organizations worldwide to quickly onboard customers, optimize costs and combat fraud. Fueled by Silicon Valley support, Trulioo stands as the trusted platform that can verify more than 5 billion people and 700 million business entities spanning 195 countries. But Trulioo is more than a tech company. We are a united force of dedicated experts committed to establishing trust online - and we’re proud to be recognized as a BC Top Employer for the second consecutive year, reflecting our commitment to an inclusive, collaborative, and people-first workplace. Headquartered in Vancouver and with strategic hubs in San Diego and Dublin, we foster a culture of collaboration and open communication. Our offices support a hybrid model and staff typically work three days per week at a hub location. Join us where excitement meets innovation and contribute to a world where trust and technology unite.

Requirements

5+ years of professional software development or data engineering experience.
Strong programming skills in Python.
Experience with data modeling and schema design in SQL and NoSQL systems.
Experience designing and maintaining data pipelines (Airflow, Dagster, Prefect, or similar).
Proficiency with cloud-based data services (AWS, GCP, Azure).
Proficiency in multiple programming languages.
Experience with entity resolution or record linkage algorithms.
Experience incorporating ML workflows into ETL pipelines.
Hands-on experience with Vector Databases and embedding-based search pipelines.
Familiarity with graph databases (Neo4j, Neptune, or Gremlin) for ETL, modeling, and querying.
Experience with OpenSearch / Elasticsearch, including index creation, tuning, and advanced queries.
Familiarity with streaming data systems (Kafka, Kinesis) or distributed processing frameworks (Spark, Flink).
Knowledge of semantic search, RAG pipelines, or LLM-enhanced retrieval.
Experience with containerization and orchestration (Docker, Kubernetes) and CI/CD pipelines.
Background in information retrieval, knowledge graphs, or data platform architecture.
Experience using data catalog/lineage tools (OpenMetadata, DataHub, etc.).
Strong experience with modern ETL tools for both large and small data processing (PySpark, Dask, DuckDB, etc.).

Nice To Haves

Strong analytical and problem-solving abilities.
Excellent communication and collaboration across technical and non-technical teams.
Curious, proactive, and adaptable to emerging data technologies.
Self-starter with ownership and accountability for delivering high-quality solutions.

Responsibilities

Build, optimize, and maintain data ingestion and transformation pipelines from multiple sources (internal systems, vendor data, web data, APIs).
Design and implement data models using the most suitable tool for the task — SQL, NoSQL, GraphDBs, or VectorDBs.
Integrate machine learning models into pipelines for entity resolution, de-duplication, semantic enrichment, and embedding generation.
Work with Vector Databases (e.g., AWS S3 Vector, PostgresVectorDb, OpenSearch) to support similarity and semantic search applications.
Collaborate with data scientists, software engineers, and analysts to deliver reliable, high-performance data infrastructure.
Ensure data quality, consistency, and performance monitoring across all pipelines and systems.