Data Engineer (Remote)

The Phia Group•Canton, MA

18h•Remote

About The Position

The Data Engineer is responsible for supporting the development, maintenance, and optimization of data pipelines and analytics-ready datasets. You will be collaborating across multiple teams and stakeholders to solve complex problems and support data-driven initiatives.

Requirements

Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or a related field; or equivalent experience
5+ years of experience in data engineering or business intelligence roles working with ETL, data modeling, data architecture, and developing pipelines and applications for analytics (e.g., BI, reporting, machine learning, deep learning)
Solid programming skills in advanced SQL, Python, or other programming languages for data processing and automation
Experience supporting or working with AI/ML workflows, including: Data preparation and feature engineering for machine learning models, Integration of data pipelines with ML frameworks (e.g., scikit-learn, TensorFlow, PyTorch, or similar), Understanding of model lifecycle concepts (training, validation, deployment, monitoring)
Expertise working with Snowflake for data warehousing, including experience with schema design, performance tuning, and optimization
Proficiency with Git, Azure DevOps, and collaborative development best practices
Experience designing, developing, and deploying end-to-end pipelines using Azure Data Factory

Responsibilities

Build, maintain, and optimize data pipelines utilizing Azure Data Factory, ensuring data is ingested, transformed, and delivered to Snowflake reliably for analytics
Implement monitoring, alerts, and testing of data pipeline performance, data quality metrics, and lineage to ensure trustworthy data delivery
Troubleshoot data issues and perform root cause analysis to proactively resolve operational issues
Document data structures, processes, architectural decisions, and best practices for knowledge sharing
Develop, maintain, and optimize Snowflake objects (schemas, tables, views) and SQL transformations to produce curated, analytics-ready datasets
Collaborate with analysts, stakeholders, and product owners to translate business needs into data requirements and stable technical implementations
Enable data for AI/ML use cases by preparing feature-rich datasets, supporting feature engineering, and ensuring data consistency for model training and inference
Support deployment and operationalization of machine learning models by integrating pipelines with ML workflows (e.g., batch/real-time scoring)
Continually improve ongoing reporting and analytics, automating or simplifying self-service or manual processes
Implement version control practices for all data engineering code and documentation