Big Data Lead

HEXAWARE•United States,

About The Position

This role focuses on designing and implementing scalable data pipelines for ingesting and transforming data, primarily using Databricks and leveraging PySpark notebooks, Spark SQL, and Python. The position involves developing ETL processes to extract, transform, and load data from diverse sources into a data lakehouse architecture on Databricks. A key aspect of this role is analyzing the existing integration landscape, which includes Teradata (TPT, BTEQ), Talend, and IBM Sterling. The Big Data Lead will define the ingestion and integration strategy for Databricks, ensure seamless data flow from source systems to the Lakehouse, and lead integration design while overseeing pipeline migration. Optimization of data processing workflows for performance and efficiency using Databricks capabilities is crucial. The role also requires ensuring data security and compliance with data privacy regulations, delivering high-quality data products, collaborating with stakeholders to understand requirements and create technical solutions using the Microsoft Azure stack, and creating comprehensive documentation.

Requirements

Experience with Databricks, PySpark notebooks, Spark SQL, and Python.
Experience developing ETL processes.
Experience with diverse data sources.
Experience with data lakehouse architecture.
Experience analyzing Teradata landscape (TPT, BTEQ, Talend, IBM Sterling).
Experience defining data ingestion and integration strategies.
Experience with data pipeline migration.
Experience optimizing data processing workflows.
Knowledge of data security and data privacy regulations.
Experience delivering data products based on business requirements.
Experience collaborating with stakeholders.
Experience creating technical solutions using Microsoft Azure stack.
Experience creating comprehensive documentation of workflows, pipelines, and architecture.

Responsibilities

Design and implement scalable data pipelines for ingesting and transforming data primarily using Databricks and leveraging PySpark notebooks, Spark SQL, and Python.
Develop ETL processes to extract, transform, and load data from diverse sources into data lakehouse architecture on Databricks.
Analyze existing integration Teradata landscape (TPT, BTEQ, Talend, IBM Sterling).
Define ingestion and integration strategy for Databricks.
Ensure seamless data flow from source systems to Lakehouse.
Lead integration design and oversee pipeline migration.
Optimize data processing workflows to enhance performance and efficiency using Databricks capabilities.
Ensure data security and compliance with data privacy regulations throughout the data engineering process.
Deliver high quality data products based on business requirements.
Collaborate with stakeholders to gather, understand requirements and create technical solutions using Microsoft Azure stack.
Create comprehensive documentation of workflows, pipelines, and architecture.