Lead Data Engineer

CGIFairfax, VA
Hybrid

About The Position

As a Lead Data Engineer, you will lead a team of data engineers and collaborate with architects, engineers, information analysts, business, and technology stakeholders to develop and deploy enterprise-grade platforms that enable data-driven solutions. You will be responsible for updating the design parameters and implementation details of robust data pipelines, optimizing data processes, ensuring data quality, security, and governance. The role requires knowledge of GCP services that include all GCP based services for GCP Dataflow, Cloud Run Functions, AlloyDB, Pub Sub streams, IAC, Big Query and Data Plex. This is an event-driven architecture processing real-time and batch data processing from legacy external systems. This position can be located at any CGI office in the U.S, preferred location is Fairfax,VA; however, a hybrid working model is acceptable. Your future duties and responsibilities: Lead the design, development, and maintenance of robust data pipelines using Dataflow and Cloud Run Functions that extract data from various sources, transforming it into the desired format, and loading it into appropriate landing zone of Pub/Sub, Google Storage, or AlloyDB for PostgreSQL. Collaborate with FAA customers, chief architect, senior engineers, information analysts, business, and technology stakeholders to develop and deploy enterprise-grade platforms that enable data-driven solutions. Develop and manage ETL (Extract, Transform, Load) processes using GCP Dataflow with GCP flex templates with Java language to support data transformations into XML and JSON documents, enriching them for downstream data consumers. Implement and manage advanced data models, including relational databases, non-relational databases, master data management, Data Plex and data governance. Integrate data from legacy data sources, including databases, data warehouses, APIs, and external systems using Data Flow and Cloud Run Functions. Ensure data consistency and integrity during the integration process, performing data validation and cleaning as needed, manage data and the quality of the data. Transform raw data into a usable format by applying data cleansing, aggregation, filtering, and enrichment techniques using data flow, cloud run functions and other GCP services. Design and Optimize data pipelines and data processing workflows for performance, scalability, and efficiency using data flow and cloud run functions. Monitor and tune data systems, identify and resolve performance bottlenecks, implement performance gains and indexing strategies to enhance query performance. Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data. Optimize and administer data environments and data related GCP services to ensure high performance and reliability. Collaborate with cross-functional teams to del

Requirements

  • Minimum of 12 years of experience in data engineering, with at least 2 years of hands-on experience with Google Cloud Based services.
  • At least 8 years of work experience in data solutions design, management disciplines, including data integration, modeling, optimization, and data quality, directly relevant to data engineering responsibilities and tasks.
  • Project management - able to understand business needs and help the team prioritize and manage the workload effectively
  • Data architect - able to provide high level guidance and oversight to the team and ensure changes are being made done in a way that ensures operational stability and long-term maintainability + operational support / understands and can direct methodologies for a 24/7 application data operation
  • Strong knowledge of SQL, Python, and Java. Data Flow flex templates are used with Java to code the desired data transformations.
  • Experience with cloud platforms such as GCP, GKE, IAM, Terraform, CI/CD to deliver automated solutions and problem solver.
  • Excellent problem-solving skills, including debugging skills, allowing the determination of sources of issues in unfamiliar code or systems, and the ability to recognize and solve repetitive problems.
  • Strong communication skills and a proactive “getting things done” mindset.
  • Experience working in Agile teams and familiarity with Agile methodologies, SCRUM with 2-week sprints.
  • Ability to design, build, and deploy data solutions that capture, explore, transform, and utilize data to support mission applications, insights, and reporting.
  • Experience with database technologies such as SQL, NoSQL, AlloyDB Postgress.
  • Ability to collaborate within and across teams of different technical knowledge to support the delivery of data products.
  • Strong knowledge of data architecture in application development or reporting.
  • Excellent organizational and analytical abilities.
  • Good written and verbal communication skills
  • Should be proficient in archit

Nice To Haves

  • Industry knowledge in sectors such as Government and Aviation preferred

Responsibilities

  • Lead the design, development, and maintenance of robust data pipelines using Dataflow and Cloud Run Functions that extract data from various sources, transforming it into the desired format, and loading it into appropriate landing zone of Pub/Sub, Google Storage, or AlloyDB for PostgreSQL.
  • Collaborate with FAA customers, chief architect, senior engineers, information analysts, business, and technology stakeholders to develop and deploy enterprise-grade platforms that enable data-driven solutions.
  • Develop and manage ETL (Extract, Transform, Load) processes using GCP Dataflow with GCP flex templates with Java language to support data transformations into XML and JSON documents, enriching them for downstream data consumers.
  • Implement and manage advanced data models, including relational databases, non-relational databases, master data management, Data Plex and data governance.
  • Integrate data from legacy data sources, including databases, data warehouses, APIs, and external systems using Data Flow and Cloud Run Functions.
  • Ensure data consistency and integrity during the integration process, performing data validation and cleaning as needed, manage data and the quality of the data.
  • Transform raw data into a usable format by applying data cleansing, aggregation, filtering, and enrichment techniques using data flow, cloud run functions and other GCP services.
  • Design and Optimize data pipelines and data processing workflows for performance, scalability, and efficiency using data flow and cloud run functions.
  • Monitor and tune data systems, identify and resolve performance bottlenecks, implement performance gains and indexing strategies to enhance query performance.
  • Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data.
  • Optimize and administer data environments and data related GCP services to ensure high performance and reliability.
  • Collaborate with cross-functional teams to del

Benefits

  • Competitive compensation
  • Comprehensive insurance options
  • Matching contributions through the 401(k) plan and the share purchase plan
  • Paid time off for vacation, holidays, and sick time
  • Paid parental leave
  • Learning opportunities and tuition assistance
  • Wellness and Well-being programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service