Senior Data Engineer

EXL
12d$140,000 - $180,000Hybrid

About The Position

Architect and implement ETL/ELT pipelines using Apache Airflow, dbt, or AWS Glue to process insurance data from sources like policy systems, claims databases, etc. Optimize large scale data warehouses (e.g., Snowflake, Redshift) for insurance use cases such as fraud detection, actuarial modeling, and premium calculations, handling terabyte scale datasets. Develop real time streaming pipelines with Kafka or Kinesis for event driven insurance workflows, like instant claims quoting or policy updates. Utilize Git for version control, ensuring proper documentation and tracking of code changes. Design data models for insurance entities (policies, claims, customers) using dimensional modeling. Integrate and transform structured/unstructured using Python/PySpark and SQL for advanced analytics in underwriting and loss reserving. Automate and maintain ETL pipelines using PySpark and AWS Glue to process and transform large volumes of data efficiently. Collaborate cross functionally with actuaries, data scientists, and business teams to deliver insights, conducting 4 days/week in Boston office for stand ups and troubleshooting. Monitor and tune performance of Spark jobs and queries for high volume insurance datasets, reducing latency for real time risk scoring. Mentor juniors on best practices and contribute to insurance specific innovations

Requirements

  • Experience with Apache Airflow, dbt, or AWS Glue
  • Experience with Snowflake or Redshift
  • Experience with Kafka or Kinesis
  • Experience with Git
  • Experience with Python/PySpark and SQL

Responsibilities

  • Architect and implement ETL/ELT pipelines using Apache Airflow, dbt, or AWS Glue to process insurance data from sources like policy systems, claims databases, etc.
  • Optimize large scale data warehouses (e.g., Snowflake, Redshift) for insurance use cases such as fraud detection, actuarial modeling, and premium calculations, handling terabyte scale datasets.
  • Develop real time streaming pipelines with Kafka or Kinesis for event driven insurance workflows, like instant claims quoting or policy updates.
  • Utilize Git for version control, ensuring proper documentation and tracking of code changes.
  • Design data models for insurance entities (policies, claims, customers) using dimensional modeling.
  • Integrate and transform structured/unstructured using Python/PySpark and SQL for advanced analytics in underwriting and loss reserving.
  • Automate and maintain ETL pipelines using PySpark and AWS Glue to process and transform large volumes of data efficiently.
  • Collaborate cross functionally with actuaries, data scientists, and business teams to deliver insights, conducting 4 days/week in Boston office for stand ups and troubleshooting.
  • Monitor and tune performance of Spark jobs and queries for high volume insurance datasets, reducing latency for real time risk scoring.
  • Mentor juniors on best practices and contribute to insurance specific innovations

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service