Data Lead Engineer Only W2 Candidates

Mega Cloud LabFremont, CA
Onsite

About The Position

This role is for a Data Lead Engineer with a focus on designing and modernizing enterprise data platforms. The ideal candidate will have extensive experience in architecting scalable ETL/ELT pipelines, leading enterprise-scale Data Architecture initiatives, and engineering large-scale distributed processing workloads. Experience with cloud-native platforms on AWS and Azure is essential, as is a strong understanding of data modeling, governance, and real-time streaming architectures.

Requirements

  • Azure Data Factory (ADF)
  • Azure Databricks & PySpark
  • Azure Synapse
  • Azure SQL
  • Python
  • Spark SQL
  • 12+ years of experience designing and modernizing enterprise data platforms
  • Experience architecting scalable ETL/ELT pipelines using Python, PySpark, Databricks, AWS Glue and Azure Data Factory
  • Experience leading enterprise-scale Data Architecture initiatives, defining logical and physical data models, governance standards and cloud-native platform blueprints across AWS and Azure environments.
  • Experience designing and implementing Medallion (Bronze/Silver/Gold) Lakehouse architectures using Delta Lake, S3, ADLS Gen2, Snowflake, Redshift and Synapse Analytics.
  • Experience engineering large-scale distributed processing workloads using Apache Spark, PySpark, Databricks, EMR, Hive and HDFS.
  • Experience orchestrating complex data workflows using Apache Airflow, Databricks Workflows, AWS Step Functions and Azure Data Factory triggers.
  • Strong hands-on experience in Advanced SQL, including complex joins, CTEs, window functions, stored procedures, indexing strategies, partitioning and execution plan optimization across Snowflake, PostgreSQL and Oracle.
  • Experience building real-time streaming architectures using Apache Kafka, AWS Kinesis, Azure Event Hub and Service Bus.

Responsibilities

  • Designing and modernizing enterprise data platforms across banking, healthcare and retail domains.
  • Architecting scalable ETL/ELT pipelines using Python, PySpark, Databricks, AWS Glue and Azure Data Factory.
  • Leading enterprise-scale Data Architecture initiatives, defining logical and physical data models, governance standards and cloud-native platform blueprints across AWS and Azure environments.
  • Designing and implementing Medallion (Bronze/Silver/Gold) Lakehouse architectures using Delta Lake, S3, ADLS Gen2, Snowflake, Redshift and Synapse Analytics.
  • Engineering large-scale distributed processing workloads using Apache Spark, PySpark, Databricks, EMR, Hive and HDFS, processing billions of records for enterprise analytics.
  • Orchestrating complex data workflows using Apache Airflow, Databricks Workflows, AWS Step Functions and Azure Data Factory triggers, ensuring SLA-driven pipeline execution.
  • Building real-time streaming architectures using Apache Kafka, AWS Kinesis, Azure Event Hub and Service Bus, supporting fraud detection, claims monitoring and operational telemetry.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service