As a Lead Data Engineer, you will lead a team of data engineers and collaborate with architects, engineers, information analysts, business, and technology stakeholders to develop and deploy enterprise-grade platforms that enable data-driven solutions. You will be responsible for updating the design parameters and implementation details of robust data pipelines, optimizing data processes, ensuring data quality, security, and governance. The role requires knowledge of GCP services that include all GCP based services for GCP Dataflow, Cloud Run Functions, AlloyDB, Pub Sub streams, IAC, Big Query and Data Plex. This is an event-driven architecture processing real-time and batch data processing from legacy external systems. This position can be located at any CGI office in the U.S, preferred location is Fairfax,VA; however, a hybrid working model is acceptable. Your future duties and responsibilities: Lead the design, development, and maintenance of robust data pipelines using Dataflow and Cloud Run Functions that extract data from various sources, transforming it into the desired format, and loading it into appropriate landing zone of Pub/Sub, Google Storage, or AlloyDB for PostgreSQL. Collaborate with FAA customers, chief architect, senior engineers, information analysts, business, and technology stakeholders to develop and deploy enterprise-grade platforms that enable data-driven solutions. Develop and manage ETL (Extract, Transform, Load) processes using GCP Dataflow with GCP flex templates with Java language to support data transformations into XML and JSON documents, enriching them for downstream data consumers. Implement and manage advanced data models, including relational databases, non-relational databases, master data management, Data Plex and data governance. Integrate data from legacy data sources, including databases, data warehouses, APIs, and external systems using Data Flow and Cloud Run Functions. Ensure data consistency and integrity during the integration process, performing data validation and cleaning as needed, manage data and the quality of the data. Transform raw data into a usable format by applying data cleansing, aggregation, filtering, and enrichment techniques using data flow, cloud run functions and other GCP services. Design and Optimize data pipelines and data processing workflows for performance, scalability, and efficiency using data flow and cloud run functions. Monitor and tune data systems, identify and resolve performance bottlenecks, implement performance gains and indexing strategies to enhance query performance. Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data. Optimize and administer data environments and data related GCP services to ensure high performance and reliability. Collaborate with cross-functional teams to del
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed