Lead Spark Data Engineer

FusemachinesNew York, NY
8h

About The Position

We are looking for an experienced Lead Data Engineer to join our team to build the "Brain" of an IoT platform, a library that allows definition and metrics, validates it against a Virtual Schema, and generates optimized execution plans for both Spark (Batch) and Flink (Stream).

Requirements

  • 5+ years of hands-on data engineering experience with deep expertise in the Azure ecosystem.
  • Expert-level Java, Python and SQL.
  • Deep understanding of Apache Spark Internals (Catalyst Optimizer, Logical Plans).
  • Experience with ANTLR v4 or writing custom DSLs/Parsers.
  • Experience with Databricks and Delta Lake optimization.
  • Experience constructing Abstract Syntax Trees (ASTs).
  • Strong understanding of SDLC and Agile methodologies with hands-on experience in Azure DevOps, GitHub, CI/CD, and artifact management.
  • Skilled in data modeling, data design, and data warehousing solutions on Azure Databricks.
  • Knowledge of data quality, governance, and security best practices within Azure (AD, NSG, encryption, compliance).

Nice To Haves

  • Azure Fundamentals, Azure Data Engineer Associate, Databricks Certified Data Engineer Professional and Azure Solutions Architect Expert (nice to have).

Responsibilities

  • Architect, design, and implement scalable and efficient data solutions on Spark and Flink.
  • Implement the grammar for the IoT Query Language.
  • Build the Query Validator to enforce semantic constraints before a query is executed.
  • Develop a Spark Adapter: A translation layer that converts definition on metrics into Spark code.
  • Implement relationships logic (traversing a Graph/Ontology) within the core to avoid database bottlenecks.
  • Ensure 100% logic parity between Spark (Batch) and Flink (Stream) implementations.
  • Manage and optimize Azure and Databricks resources, for performance, reliability, and cost-efficiency.
  • Transform, clean, and prepare data using SQL, Python and Java.
  • Monitor and fine-tune workloads and pipelines for optimal performance and reliability.
  • Maintain clear documentation of solutions, configurations, and workflows.
  • Actively participate in Agile team activities and continuous improvement initiatives.
  • Promote and enforce data engineering best practices, including data governance, security, and data quality.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service