Senior Data Engineer with DevOps

CGI•Dallas, TX

10h•Onsite

About The Position

We are seeking a Data Engineer with 5 years of experience to design and maintain scalable data pipelines supporting analytics, reporting, and operational needs. The role involves collaborating with cross-functional teams to ensure data alignment with business requirements and enterprise standards. This role will require someone at our client site 5 days a week in either Pittsburgh, PA, Cleveland, OH, or Dallas, TX. For this role on this particular client engagement, employer sponsorship of immigration-related visa and/or green card status as part of the PERM process will not be available.

Requirements

5+ years of experience in data engineering and big data processing
Strong expertise in Apache Spark (Spark Core, Spark SQL) and PySpark for large scale batch processing
Experience working with structured and semi-structured data, including complex transformations and performance tuning
Proficiency in data ingestion and integration from sources like Oracle, SQL Server, Hive, HDFS, and S3; transform data into ‘curated data models'
Experience writing data to Hive tables, Data Lakes (Iceberg), and downstream reporting systems
Strong knowledge of SQL and data modeling concepts
Hands-on experience with Apache Airflow for workflow orchestration (DAG design, scheduling expectations, monitoring)
Proficiency in shell scripting for job automation, file validation, dependency handling, and logging. Trigger Spark Jobs, perform file checks and validation; Archive & purge data; manage job dependency, logging & error handling
Strong understanding of batch processing and batch job scheduling frameworks
Experience migrating from CA7/Control M Airflow (daily, hourly, weekly schedules)
CI/CD for data pipelines
Fundamentals in Linux and Networking
Docker, OCP containerization / Kubernetes
Knowledge of CI/CD pipeline tools: Tools commonly include Jenkins, GitHub Actions, Azure DevOps, GitLab CI, Maven, and Gradle
Automate operational tasks using Python, Bash/Shell, and PowerShell
Implement monitoring and alerting, Application Insights. Enable centralized logging with tools such as ELK.
Experience ensuring data quality, reliability, and compliance in regulated environments
Good communication and documentation skills

Responsibilities

Design and build scalable data pipelines aligned with business needs
Process large datasets (batch + sometimes near Realtime)
Ensure data quality, consistency, and governance standards across systems
Support data integration and transformation efforts for analytics and reporting platforms
Maintain data dictionaries, metadata, and documentation
Participate in data architecture reviews and model validation processes
Support analytics reporting and risk platforms