Data Engineer

Liberty Mutual Insurance•Glastonbury, CT

2d•Remote

About The Position

Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time. Create metrics and apply business logic using Spark, Scala, R, Python, and/or Java. Model, design, develop, code, test, debug, document and deploy an application to production through standard processes. Harmonize, transform, and move data from a raw format to consumable, curated views. Analyze, design, develop, and test applications. Contribute to the maturation of Data Engineering practices, which may include providing training and mentoring to others. Perform Data Designer activities to transform raw data to meaningful datasets and extracts, such as business logic design, source-to-target mappings, data sourcing strategy, and transformation rules. Apply strong Data Governance principles, standards, and frameworks to promote data consistency and quality while effectively managing and protecting the integrity of corporate data. Collaborate with cross-functional teams including data scientists, analysts, and business stakeholders to gather requirements and deliver data solutions aligned with business goals. Document technical specifications, data flow diagrams, and operational procedures to support knowledge transfer and audit requirements. Develop ETL/ELT workflows using Python, PySpark, and SQL to transform raw data into structured formats suitable for downstream consumption. Ensure data quality and integrity through the implementation of data validation, monitoring, and alerting mechanisms using tools like such as AWS CloudWatch, Glue DataBrew, and custom scripts.

Requirements

Bachelor’s degree (or foreign equivalent) in Computer Science, Electronics Engineering, Information Systems or related field and 3 years of experience in the job offered or in a Data Engineer-related occupation.
Relational database systems, including one of the following: SQL Server, MySQL, or PostgreSQL.
Data Warehousing Platforms including one of the following: AWS Redshift, Teradata, SQL Server.
ETL tools and data integration frameworks including one of the following: AWS Glue or Apache Spark.
Demonstrated ETL development and data modeling experience.
Demonstrated Data analysis skills and data profiling.
Demonstrated GIT Version Control and CI/CD frameworks including one of the following: Bamboo, Github Actions.
XP and Atlassian products including Jira and Confluence.
Python, Java, or Scala.
Demonstrated strong knowledge and experience with SQL.
Demonstrated understanding of property and casualty insurance data knowledge.

Responsibilities

Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
Create metrics and apply business logic using Spark, Scala, R, Python, and/or Java.
Model, design, develop, code, test, debug, document and deploy an application to production through standard processes.
Harmonize, transform, and move data from a raw format to consumable, curated views.
Analyze, design, develop, and test applications.
Contribute to the maturation of Data Engineering practices, which may include providing training and mentoring to others.
Perform Data Designer activities to transform raw data to meaningful datasets and extracts, such as business logic design, source-to-target mappings, data sourcing strategy, and transformation rules.
Apply strong Data Governance principles, standards, and frameworks to promote data consistency and quality while effectively managing and protecting the integrity of corporate data.
Collaborate with cross-functional teams including data scientists, analysts, and business stakeholders to gather requirements and deliver data solutions aligned with business goals.
Document technical specifications, data flow diagrams, and operational procedures to support knowledge transfer and audit requirements.
Develop ETL/ELT workflows using Python, PySpark, and SQL to transform raw data into structured formats suitable for downstream consumption.
Ensure data quality and integrity through the implementation of data validation, monitoring, and alerting mechanisms using tools like such as AWS CloudWatch, Glue DataBrew, and custom scripts.