Data Engineer

FusemachinesNew York City, NY
Remote

About The Position

This is a remote full-time consulting position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics). We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products , including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.

Requirements

  • Must have a full-time Bachelor's degree in Computer Science or similar
  • At least 3 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers.
  • 3+ years of experience with Azure DevOps, GitHub.
  • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrations.
  • Databricks Certified Associate Developer for Apache Spark
  • Databricks Certified Data Engineer Associate
  • Microsoft Certified: Azure Fundamentals
  • Microsoft Certified: Azure Data Engineer Associate
  • Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing and manipulation.
  • Strong understanding and experience with SQL and writing advanced SQL queries.
  • Thorough understanding of big data principles, techniques, and best practices.
  • Strong experience with scalable and distributed Data Processing Technologies such as Spark/ PySpark (must have: experience with Azure Databricks ), DBT and Kafka, to be able to handle large volumes of data.
  • Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment.
  • Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as needed.
  • Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.
  • Expertise in data cleansing, transformation, and validation.
  • Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table).
  • Good understanding of Data Modeling and Database Design Principles.
  • Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
  • Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure and Databricks.
  • Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).
  • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
  • Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar).
  • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform, ARM including hands-on experience), configuration management, automated testing, performance tuning and cost management and optimization.
  • Strong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks , Azure Synapse Analytics , Azure Data Lake , Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.
  • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
  • Strong knowledge of data structures and algorithms and good software engineering practices.
  • Proven experience migrating from Azure Synapse to Azure Data Lake, or other technologies.
  • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
  • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
  • Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
  • Strong written and verbal communication skills to collaborate and articulate complex situations concisely with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams.
  • Ability to document processes, procedures, and deployment configurations.
  • Understanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standards.
  • Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
  • Self-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the team.
  • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
  • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
  • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.

Nice To Haves

  • Microsoft Exam: Designing and Implementing Microsoft DevOps Solutions
  • Experience with BI solutions including PowerBI is a plus.

Responsibilities

  • Architect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure.
  • Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and Azure.
  • Contribute to detailed design, architectural discussions, and customer requirements sessions.
  • Actively participate in the design, development, and testing of big data products.
  • Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform.
  • Migrate out of Azure Synapse to Azure Data Lake or other technologies.
  • Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive).
  • Design and implement data models and schemas that support efficient data processing and analytics.
  • Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc.
  • Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables.
  • Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis.
  • Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.
  • Continuously monitor and fine-tune workloads and clusters to achieve optimal performance.
  • Provide guidance and mentorship to junior team members, sharing knowledge and best practices.
  • Maintain clear and comprehensive documentation of the solutions, configurations, and best practices implemented.
  • Promote and enforce best practices in data engineering, data governance, and data quality.
  • Ensure data quality and accuracy.
  • Design, Implement and maintain data security and privacy measures.
  • Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service