Senior Data Engineer

EXL

12d•$140,000 - $180,000•Hybrid

About The Position

Architect and implement ETL/ELT pipelines using Apache Airflow, dbt, or AWS Glue to process insurance data from sources like policy systems, claims databases, etc. Optimize large scale data warehouses (e.g., Snowflake, Redshift) for insurance use cases such as fraud detection, actuarial modeling, and premium calculations, handling terabyte scale datasets. Develop real time streaming pipelines with Kafka or Kinesis for event driven insurance workflows, like instant claims quoting or policy updates. Utilize Git for version control, ensuring proper documentation and tracking of code changes. Design data models for insurance entities (policies, claims, customers) using dimensional modeling. Integrate and transform structured/unstructured using Python/PySpark and SQL for advanced analytics in underwriting and loss reserving. Automate and maintain ETL pipelines using PySpark and AWS Glue to process and transform large volumes of data efficiently. Collaborate cross functionally with actuaries, data scientists, and business teams to deliver insights, conducting 4 days/week in Boston office for stand ups and troubleshooting. Monitor and tune performance of Spark jobs and queries for high volume insurance datasets, reducing latency for real time risk scoring. Mentor juniors on best practices and contribute to insurance specific innovations

Requirements

Experience with Apache Airflow, dbt, or AWS Glue
Experience with Snowflake or Redshift
Experience with Kafka or Kinesis
Experience with Git
Experience with Python/PySpark and SQL

Responsibilities

Architect and implement ETL/ELT pipelines using Apache Airflow, dbt, or AWS Glue to process insurance data from sources like policy systems, claims databases, etc.
Optimize large scale data warehouses (e.g., Snowflake, Redshift) for insurance use cases such as fraud detection, actuarial modeling, and premium calculations, handling terabyte scale datasets.
Develop real time streaming pipelines with Kafka or Kinesis for event driven insurance workflows, like instant claims quoting or policy updates.
Utilize Git for version control, ensuring proper documentation and tracking of code changes.
Design data models for insurance entities (policies, claims, customers) using dimensional modeling.
Integrate and transform structured/unstructured using Python/PySpark and SQL for advanced analytics in underwriting and loss reserving.
Automate and maintain ETL pipelines using PySpark and AWS Glue to process and transform large volumes of data efficiently.
Collaborate cross functionally with actuaries, data scientists, and business teams to deliver insights, conducting 4 days/week in Boston office for stand ups and troubleshooting.
Monitor and tune performance of Spark jobs and queries for high volume insurance datasets, reducing latency for real time risk scoring.
Mentor juniors on best practices and contribute to insurance specific innovations