Senior Data Engineer

Capgemini•Vancouver, BC

10d•$76,200 - $176,590

About The Position

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world. Job Summary We are seeking an experienced Data Engineer with strong expertise in Databricks, Apache Airflow, Python, and PySpark to design, build, and maintain scalable, high-performance data solutions. The ideal candidate will be responsible for developing efficient data pipelines, orchestrating workflows, and ensuring the reliability and quality of data systems that support analytics and business operations.

Requirements

Strong hands-on experience in Python programming
Expertise in PySpark and Apache Spark for large-scale data processing
Experience with Apache Airflow for workflow scheduling and orchestration
Practical experience in Databricks platform (notebooks, jobs, clusters, Delta Lake)
Solid understanding of ETL/ELT concepts and data pipeline architecture
Proficiency in SQL and working with relational and non-relational databases
Experience working with cloud platforms (Azure preferred; AWS/GCP acceptable)
Strong debugging and root cause analysis skills
Familiarity with data modeling concepts
Understanding of data security and governance best practices

Nice To Haves

Experience with Azure Data Factory, Azure Data Lake, or similar services
Knowledge of CI/CD pipelines and DevOps practices
Exposure to streaming technologies (Kafka, Spark Streaming, etc.)
Experience with version control tools (Git)
Familiarity with monitoring tools and logging frameworks

Responsibilities

Design, develop, and maintain scalable ETL/ELT data pipelines using Python and PySpark
Build and manage workflow orchestration using Apache Airflow
Develop and optimize data processing solutions on Databricks, leveraging Spark and Delta Lake
Perform data ingestion from multiple sources (databases, APIs, files, cloud systems)
Implement data transformations, cleansing, and aggregation to support downstream analytics
Monitor, troubleshoot, and resolve job failures and performance issues
Optimize jobs for performance, scalability, and cost efficiency
Ensure data quality, consistency, and governance across all pipelines
Collaborate with data analysts, data scientists, and business stakeholders to deliver data solutions
Maintain proper documentation for workflows, pipelines, and data models
Support deployment processes and ensure smooth release of data solutions