Associate Director, Principal Data Engineer

RBC•Toronto, ON

1d•Onsite

About The Position

As a Data Engineering Solution Architect, you will join a highly talented team at RBC Capital Markets responsible for designing and delivering next-generation data engineering platforms on Databricks. You will be the hands-on technical anchor driving Proof of Concepts (POCs) through to full production delivery, working across large-scale Spark-based data pipelines, ML engineering workflows, and Delta Lake architectures. You will bring deep, practitioner-level expertise in Databricks and Apache Spark — not just designing solutions on whiteboards, but rolling up your sleeves to build, tune, and deliver them. You will work closely with data scientists, ML engineers, business analysts, and platform teams to architect and implement scalable, high-performance data solutions that power critical financial workflows across RBC. You will exhibit a strong engineering mindset, display technical leadership, and bring the energy and rigor needed to raise the bar on data engineering standards across the organization.

Requirements

8+ years of hands-on, production-grade experience with Apache Spark and Databricks, including Spark code development and fine-tuning.
Databricks Certified Data Engineer Associate/Professional — mandatory pre-requisite.
Databricks Certified Machine Learning Associate/Professional — mandatory pre-requisite.
Proven track record of leading and delivering end-to-end data engineering solutions on Databricks in a financial services or similarly complex enterprise environment.
Deep expertise in Delta Lake, Delta Live Tables, Unity Catalog, and the Databricks Lakehouse architecture.
Strong proficiency in Python (PySpark) and/or Scala for Spark development, with demonstrable experience in performance tuning (partitioning, caching, shuffle optimization, adaptive query execution).
Experience architecting and delivering POCs independently — from scoping, prototyping, to stakeholder-ready demonstration.
Hands-on experience with cloud platforms (Azure or AWS) for big data workloads, including cloud storage, networking, and IAM.
Solid understanding of MLflow lifecycle management, model versioning, and ML pipeline orchestration within Databricks.
Familiarity with CI/CD tooling for data pipelines — including GitHub Actions, Azure DevOps, or equivalent.

Nice To Haves

Experience with real-time streaming architectures using Kafka, Spark Structured Streaming, or Delta Live Tables.
Exposure to Databricks on Azure (ADLS Gen2, Azure Data Factory integration) or on AWS (S3, Glue integration).
Experience with infrastructure-as-code tooling such as Terraform for Databricks workspace provisioning and cluster management.
Familiarity with Agile/Scrum delivery methodologies.
Knowledge of data governance frameworks, data cataloguing tools, and data quality standards in a regulated financial environment.
Additional Databricks certifications (e.g. Databricks Certified Associate Developer for Apache Spark) are a plus.

Responsibilities

Lead the end-to-end architecture, design, and hands-on delivery of data engineering solutions on the Databricks Lakehouse Platform, from POC through to production.
Drive Spark code development, optimization, and fine-tuning to ensure high-performance, cost-efficient data pipelines at scale.
Architect and implement Delta Lake solutions including schema design, medallion architecture, data quality frameworks, and incremental ingestion patterns.
Champion best practices in Databricks Workflows, Unity Catalog, Auto Loader, and structured streaming for both batch and real-time data processing.
Design and build ML Engineering pipelines using MLflow, Feature Store, and Model Serving within the Databricks ecosystem.
Collaborate closely with product owners, business analysts, data scientists, and platform engineers to translate business requirements into robust technical solutions.
Conduct performance benchmarking and tuning of Spark jobs, cluster configurations, and storage layouts to optimize cost and runtime.
Establish and enforce coding standards, peer review practices, and CI/CD pipelines for data engineering workloads.
Proactively evaluate emerging Databricks and Apache Spark capabilities and assess their applicability to current and future RBC use cases.
Provide technical mentorship and hands-on guidance to junior and mid-level data engineers across the team.