Kafka/Spark Developer

CGI•Pittsburgh, PA

2d•Onsite

About The Position

CGI is looking for mid-level Kafka and Spark Software Developers to join our Applications Development and Maintenance team, supporting our client which is a large US Bank, working in an advanced technology environment. This role will require someone at our client site 5 days a week in Pittsburgh, PA.

Requirements

At least 5+ years of experience in Big Data development, data engineering, or distributed data processing environments.
Strong hands-on experience with Apache Kafka, topic configuration, producer/consumer development, Kafka Connect, and Schema Registry.
Extensive experience developing real-time data processing applications using Apache Spark Streaming and/or Spark Structured Streaming.
Proficiency in Java, Scala, or Python (PySpark) with strong object-oriented programming and software development skills.
Proficiency in writing and optimizing complex SQL queries using Impala, Hive, or similar distributed query engines.
Hands-on experience with Hadoop ecosystem components including HDFS, Hive.
Experience integrating Kafka and Spark with relational databases, NoSQL databases, cloud storage platforms, and enterprise applications.
Strong analytical, troubleshooting, and performance tuning skills in distributed streaming environments.
Excellent communication, collaboration, and stakeholder management skills, with the ability to work effectively in Agile/Scrum teams.
Experience working in Agile development environments with strong collaboration, technical leadership, problem-solving, and stakeholder communication skills.

Nice To Haves

Neo4j

Responsibilities

As Kafka Spark Software developers, you will be responsible for developing and maintaining scalable big data solutions using Hadoop, Spark, Kafka, and Impala to support enterprise data processing and analytics initiatives.
Design, build, and optimize batch and real-time data pipelines for ingesting, processing, transforming, and delivering large volumes of structured and unstructured data.
Develop Spark applications using PySpark, Scala, or Java for data transformation, aggregation, cleansing, and analytical processing.
Build and maintain Kafka producers, consumers, topics, and streaming workflows to enable reliable real-time data ingestion and event-driven architectures.
Design and implement logical and physical data models to support data warehousing, reporting, analytics, and business intelligence requirements.
Monitor, troubleshoot, and tune Kafka and Spark streaming jobs to improve performance, scalability, and operational reliability.
Optimize Hadoop ecosystem components, Spark jobs, Kafka configurations, and Impala queries to improve system performance and resource utilization.
Collaborate with architects, data engineers, DevOps teams, and business stakeholders to design and implement modern streaming and event-driven data platforms.
Analyzing user requirements, and defines technical project scope and assumptions for assigned tasks.
Creating technical designs for new systems, and/or modifications to existing systems.
Translating detailed requirements into functional system designs.
Prioritizing work, meeting deadline and also establishing and maintaining effective working relationships with clients, project team members, supervisors, and employees from other departments.
Partner with business leaders, enterprise architects, and product owners to identify new graph-based use cases, evaluate emerging technologies, and align Neo4j initiatives with digital transformation goals.