Capco-posted 2 months ago
$131,000 - $150,000/Yr
Full-time • Mid Level
New York, NY
1,001-5,000 employees
Professional, Scientific, and Technical Services

The Data Engineer will serve as the lead technical specialist for designing and implementing data science and advanced analytics capabilities on Microsoft Azure Fabric and Databricks. This role focuses on data processing, identity resolution, entity linking, and data warehouse development that enable organizations to unify fragmented data across multiple systems into a trusted, governed, and analytics-ready model. The ideal candidate combines deep hands-on expertise in Databricks engineering, data modeling, and applied data science, with the ability to build scalable, production-grade data solutions in collaboration with business, engineering, and analytics teams.

  • Design and develop data lakehouse and warehouse structures within Azure Databricks and Fabric environments.
  • Build ETL and ELT pipelines to extract, cleanse, normalize, and enrich data from CRM, ERP, LMS, and financial systems.
  • Develop reusable data transformation and validation frameworks leveraging PySpark, SQL, and Delta Live Tables.
  • Support the operationalization of the central data warehouse using Azure SQL and Fabric Data Warehouse.
  • Implement entity resolution models to unify customer, member, or participant records across systems using deterministic and probabilistic matching techniques.
  • Design and deploy matching algorithms utilizing Databricks MLflow, PySpark, and Azure Machine Learning for cross-system deduplication and linkage.
  • Collaborate with architects to define unique identifiers, external keys, and golden record frameworks for enterprise data integration.
  • Monitor and continuously refine data matching accuracy, precision, and recall metrics.
  • Develop and schedule data ingestion pipelines in Azure Fabric and Databricks for recurring Excel, CSV, and structured PDF sources using Power Automate, Form Recognizer, and Fabric Dataflows.
  • Apply data quality and validation rules to flag incomplete, inconsistent, or stale records.
  • Build and automate data lineage, change tracking (CDC), and error-handling workflows.
  • Support performance tuning and scalability for high-volume processing environments.
  • Provide curated and feature-engineered datasets for Power BI dashboards and machine learning use cases.
  • Partner with data analysts to define KPIs and enable cross-system reporting and predictive insights.
  • Develop scripts and notebooks to support exploratory data analysis (EDA) and visualization in Databricks.
  • BA in Data Science, Computer Science, Applied Mathematics, or related discipline.
  • 5+ years of experience in data engineering and applied data science on Azure platforms.
  • 3+ years building and managing pipelines in Azure Databricks (PySpark, Delta Lake, MLflow).
  • 2+ years hands-on experience with Microsoft Fabric (Data Factory, Dataflow Gen2, Data Warehouse).
  • Power BI integration and data modeling.
  • Entity resolution and master data management (MDM) methods.
  • Statistical modeling, clustering, and record linkage algorithms.
  • Data governance, lineage tracking, and compliance (PII, HIPAA, etc).
  • Proven track record implementing identity resolution and entity linking frameworks.
  • Strong background in SQL, Python, and large-scale data processing for analytics.
  • Microsoft Certified: Fabric Analytics Engineer Associate
  • Microsoft Certified: Azure Data Scientist Associate
  • Databricks Certified Machine Learning Professional
  • Azure Data Engineer Associate
  • Medical, dental and vision insurance
  • 401(k) plan
  • Tuition reimbursement
  • Work culture focused on innovation and creation of lasting value for clients and employees
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service