Senior Python Developer with Spark

CGI•Reston, VA

1d•Hybrid

About The Position

We are seeking a Senior Python Developer with Spark skills to design, build, and optimize large-scale data processing systems in a cloud-native AWS environment. This role focuses heavily on developing high-performance data pipelines using Spark and Python, as well as working with complex relational datasets. We partner with 15 of the top 20 banks globally, and our top 10 banking clients have worked with us for an average of 26 years!. This role is located at a client site in either Reston, VA. A hybrid working model is acceptable.

Requirements

8+ years of hands-on experience with Python for data engineering, including building and maintaining data pipelines
Deep expertise in Apache Spark, including: Performance tuning (partitioning, caching, broadcast joins, shuffle optimization), Understanding of execution plans (DAGs, stages, tasks), Memory and resource management
Solid experience with big data ecosystems such as Hadoop, Hive, and EMR
Advanced proficiency in SQL, including: Writing recursive CTEs for hierarchical data (e.g., org structures, parent-child relationships), Query optimization, indexing strategies, and execution plan analysis
Strong experience with AWS services including EMR, Lambda, Step Functions, EventBridge, Redshift, S3, and Glue
Experience building and consuming APIs, along with data transformation and ingestion workflows
Proven ability to work with large-scale datasets, performing data analysis and extracting actionable insights
Familiarity with data modeling concepts (normalized/denormalized structures, handling hierarchical data)
Hands-on experience with CI/CD pipelines and tools such as GitLab and Terraform
Strong understanding of performance troubleshooting, including identifying bottlenecks in distributed systems
Ability to clearly explain technical decisions, especially around Spark optimization and SQL logic
Strong analytical thinking, problem-solving skills, and attention to detail
Effective collaboration skills in cross-functional, matrixed environments

Nice To Haves

Experience in financial services or regulated environments
Exposure to data visualization tools
Familiarity with event-driven architectures on AWS

Responsibilities

Design, build, and optimize large-scale data processing systems in a cloud-native AWS environment.
Develop high-performance data pipelines using Spark and Python.
Work with complex relational datasets.
Diagnose and improve inefficient data workflows, particularly Spark jobs.
Write advanced SQL to support hierarchical and analytical use cases.
Partner with cross-functional teams to deliver scalable, reliable, and well-architected data solutions that support critical business functions in a financial services setting.