Senior Data Engineer, AI Systems

Movable Ink•Toronto, ON

15h•CA$144,000 - CA$188,000

About The Position

Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning. The world’s most innovative brands rely on Movable Ink to maximize revenue, simplify workflow and boost marketing agility. Headquartered in New York City with close to 600 employees, Movable Ink serves its global client base with operations throughout North America, Central America, Europe, Australia, and Japan. The AI Systems team owns the core recommendations engine and ML platform that powers billions of AI-driven marketing decisions daily across some of the world's largest consumer brands. As a Senior Data Engineer, you will own the Spark-based data pipelines and data infrastructure at the heart of this system - building, scaling, and optimizing the data layer that feeds our production ML models. You will work alongside ML engineers and scientists in a collaborative environment, contributing data pipelines and products to power our core recommender systems and our DaVinci Personalization product. This is an opportunity to work end-to-end on large-scale data systems that touch millions of customers, on a team working at the intersection of data engineering and machine learning.

Requirements

5+ years of data engineering experience
Deep expertise with Apache Spark, including the PySpark DataFrame API and experience solving challenging scaling problems
Experience with large-scale data processing, cluster configuration, optimization, and tuning (we use GCP Dataproc)
Strong software development skills in Python (unit testing, git, code review, CI/CD)
Experience with data storage formats (we use Parquet, Delta Lake)
Experience with event streaming data (we use Kafka)
Experience with cloud computing platforms (we use Google Cloud Platform)
Experience with advanced query optimization
Familiar with Software Development Lifecycle practices, such as continuous integration/continuous delivery and automated deployment (we use Docker, Kubernetes, and GitHub Actions)
Ability to collaborate with technical partners - you'll be working closely with ML engineers, scientists, and other teams to determine requirements and make design decisions
Enjoys working in a fast-paced, goal-driven environment

Responsibilities

Build, maintain, and optimize production data pipelines that power AI-driven personalization at scale across content selection, send-time optimization, subject line personalization, and frequency capping
Own and scale Spark-based batch pipelines, including cluster configuration, tuning, and performance optimization across GCP Dataproc
Build and maintain our ML Data Lake, ensuring data quality, accessibility, and efficient storage
Support the data needs of ML Engineers and Scientists for model development, training, and evaluation
Identify and resolve performance bottlenecks and scaling limitations in data pipelines and infrastructure
Collaborate with distributed systems engineers on the platform's architectural evolution, ensuring data layer continuity throughout
Continuously improve data infrastructure for greater scalability and reliability
Release features and data products that deliver measurable and tangible business value