Architect and implement ETL/ELT pipelines using Apache Airflow, dbt, or AWS Glue to process insurance data from sources like policy systems, claims databases, etc. Optimize large scale data warehouses (e.g., Snowflake, Redshift) for insurance use cases such as fraud detection, actuarial modeling, and premium calculations, handling terabyte scale datasets. Develop real time streaming pipelines with Kafka or Kinesis for event driven insurance workflows, like instant claims quoting or policy updates. Utilize Git for version control, ensuring proper documentation and tracking of code changes. Design data models for insurance entities (policies, claims, customers) using dimensional modeling. Integrate and transform structured/unstructured using Python/PySpark and SQL for advanced analytics in underwriting and loss reserving. Automate and maintain ETL pipelines using PySpark and AWS Glue to process and transform large volumes of data efficiently. Collaborate cross functionally with actuaries, data scientists, and business teams to deliver insights, conducting 4 days/week in Boston office for stand ups and troubleshooting. Monitor and tune performance of Spark jobs and queries for high volume insurance datasets, reducing latency for real time risk scoring. Mentor juniors on best practices and contribute to insurance specific innovations
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees