Data Developer

RBC•Toronto, ON

25d

About The Position

What is the opportunity? At RBC, our data engineering team enhances visibility into assets across the Public Cloud and Application Security landscape. Our mission is to provide clear insights into digital infrastructure, enabling effective identification and management of security risks. As a Data Developer, you will be a vital member of our team, driving the development of a cloud-based data platform that powers analytics and operational reporting. We harness industry-leading tools like Databricks, Python, and SQL, transforming data into strategic assets. Your experience will enable you to design reliable ingestion pipelines and facilitate data accessibility while maintaining robust security measures. Collaboration is key to our success, fostering an innovative environment where team members leverage their technical skills to drive continuous advancements in cloud security and data utilization across the organization. What will you do?

Requirements

Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field
Minimum 3+ years of experience in data development, preferably in cloud-based environments
Expert-level proficiency in Python including advanced features, object-oriented programming, and design patterns
Strong experience with Python data libraries (Pandas, NumPy) and testing frameworks (pytest, unittest)
Deep understanding of PySpark for distributed data processing and large-scale transformations
Strong SQL skills for complex data queries and transformations
Hands-on experience with Databricks platform including Delta Lake, Unity Catalog, and Lakehouse Architecture
Experience building CDC pipelines and implementing real-time data synchronization solutions
Experience managing cloud setups, particularly Azure Cloud Services
Proven ability to write clean, maintainable, and well-documented Python code following best practices
Understanding of data governance frameworks and compliance requirements
Ability to work in fast-paced environments and adapt to changing priorities
English fluency, verbal and written
Strong problem-solving skills and an engineering mindset

Nice To Haves

Familiarity with CI/CD methodologies and Infrastructure-as-Code (Terraform)
Experience with Databricks Workflows or Apache Airflow for orchestration
Knowledge of SCM (Source Code Management) tools and version control
Databricks certifications (e.g., Databricks Certified Data Engineer)
Exposure to Docker and containerization technologies
Understanding of business intelligence and reporting tools (e.g., Tableau, Power BI)
Familiarity with Cyber Security concepts and secure data practices
Experience with data modeling and dimensional design

Responsibilities

Develop and maintain Databricks-based data platform using Azure Databricks, leveraging Python, PySpark, and Spark SQL to support analytics and operational reporting
Design robust data ingestion and transformation pipelines using Python and PySpark to efficiently process large datasets
Build and manage CDC (Change Data Capture) pipelines leveraging Python for real-time data synchronization and incremental data loads
Develop and optimize ELT/ETL workflows using Databricks Workflows or Apache Airflow, with Python-based orchestration and automation
Design and manage Delta Lake solutions for data versioning, efficient data storage, and schema evolution
Write production-grade Python code for data processing, pipeline automation, and custom data transformations
Ensure datasets are clean, reliable, and ready for consumption by implementing data quality checks and validation processes using Python and SQL
Implement data governance and compliance standards using Unity Catalog for access management and data lineage tracking
Collaborate with cross-functional teams including data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable insights
Monitor, troubleshoot, and optimize Spark jobs for performance, addressing pipeline bottlenecks and ensuring cost efficiency
Implement CI/CD methodologies for automated deployment and testing of data pipelines using Python-based frameworks
Develop reusable Python libraries and frameworks to accelerate data platform development
Develop and maintain comprehensive documentation for data pipelines, transformations, and data models
Contribute to data platform enhancements that drive excellence across multiple business units

Benefits

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
Leaders who support your development through coaching and managing opportunities
Work in a dynamic, collaborative, progressive, and high-performing team
A world-class training program in financial services
Flexible work/life balance options
Opportunities to do challenging work
Opportunities to take on progressively greater accountabilities
Opportunities to building close relationships with clients

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume