Senior / Staff Data Engineer — Data Processing & Platform

Index Exchange•Toronto, ON

2d•Hybrid

About The Position

At Index Exchange, we’re reinventing how digital advertising works—at scale. As a global advertising supply-side platform, we empower the world’s leading media owners and marketers to thrive in a programmatic, privacy-first ecosystem. We’re a proud industry pioneer with over 20 years of experience accelerating the ad technology evolution. Our proprietary tech is trusted by some of the world’s largest brands and media owners and plays a crucial role in keeping the internet open, accessible, and largely free. We process more than 550 billion real-time auctions every day with ultra-low latency. Our platform is vertically integrated from servers to networks and runs primarily on our own metal and cloud infrastructure. This end-to-end infrastructure is designed to provide both stability and agility, enabling us to adapt quickly as the market evolves. At the core of it all is our engineering-first culture. Our engineers tackle internet-scale problems across tight-knit, global teams. From moving petabytes of data and optimizing with AI to making real-time infrastructure decisions, Indexers have the agency and influence to shape the future of advertising. We move fast, build thoughtfully, and stay grounded in our core values. We are hiring a Senior / Staff Data Engineer to build and evolve the data processing and pipeline layer that powers reporting, billing systems, and real-time data products at Index Exchange. This role focuses on designing and operating large-scale batch and streaming data pipelines, enabling reliable, scalable, and efficient data transformation across the platform. You will work on systems that transform raw, high-volume event data into clean, queryable, and production-grade datasets, supporting both API-driven data products and analytical workflows. You will work on high-scale data systems that: Process billions of events per day across distributed pipelines Power core business datasets (reporting, billing, marketplace metrics) Operate across batch (Spark) and streaming (Kafka / Flink) architectures Require careful balancing of: data correctness processing efficiency latency vs cost trade-offs You will solve problems such as: Designing pipelines that scale without exploding compute costs Managing data correctness at scale (deduplication, late data, joins) Building systems that support both: historical backfills near real-time updates Evolving pipelines from centralized processing (Hadoop) toward more distributed and efficient patterns Streaming pipelines and Streaming DWs.

Requirements

Strong experience in data engineering at scale
Deep expertise in Spark (required)
Deep expertise in SQL and data modeling
Experience with Airflow or workflow orchestration
Experience with Kafka or streaming systems
Strong understanding of distributed data processing
Strong understanding of data modeling (large-scale datasets)
Strong understanding of performance optimization
Ability to own pipelines end-to-end
Ability to debug complex data issues
Ability to work in high-scale, evolving environments

Nice To Haves

Define data processing standards and patterns across teams
Lead large-scale pipeline and platform initiatives
Influence data architecture and modeling decisions
Drive improvements across reliability, cost efficiency, and scalability

Responsibilities

Design and operate pipelines using Spark (primary) and Kafka / Flink (streaming).
Transform raw event data into cleaned datasets (silver layer) and business-ready datasets (gold / reporting tables).
Build and maintain canonical datasets (aggregated datasets, reporting tables).
Define data contracts and ensure consistency across pipelines.
Support evolving use cases: reporting, billing, ML / experimentation.
Build and maintain Airflow DAGs for pipeline scheduling, dependency management, and backfills.
Improve reliability and observability of workflows.
Optimize pipelines for performance (runtime, throughput), cost (compute efficiency), and scalability (data growth).
Improve partitioning strategies, data layout, and job execution patterns.
Build pipelines that support incremental updates, streaming transformations, and aggregation at scale.
Contribute to evolving patterns such as edge aggregation, streaming → batch convergence, and real-time data availability.
Define and evolve data processing patterns: batch vs streaming, aggregation strategies, incremental vs full recompute.
Work across Spark (core processing), Kafka (transport), Flink (streaming compute), and storage systems (Hadoop / Ceph).
Contribute to data platform architecture decisions, pipeline standardization, and reusable data processing frameworks.
Influence trade-offs: latency vs cost, correctness vs performance, compute vs storage.

Benefits

Comprehensive health, dental, and vision plans for you and your dependents
Paid time off, health days, and personal obligation days plus flexible work schedules
Competitive retirement matching plans
Equity packages
Generous parental leave available to birthing, non-birthing, and adoptive parents
Annual well-being allowance plus fitness discounts and group wellness activities
Commuter benefits and discounts, where available
Employee assistance program
Mental health first aid program that provides an in-the-moment point of contact and reassurance
One day of volunteer time off per year and a donation-matching program
Bi-weekly town halls and regular community-led team events
Multiple resources and programming to support continuous learning
A workplace that supports a diverse, equitable, and inclusive environment