Data Engineer (DataOps & Infrastructure Focus)

Guidewire Software•San Mateo, CA

3d•$124,000 - $186,000•Hybrid

About The Position

You will join Guidewire’s Product organization, working at the intersection of data, AI, and cloud infrastructure to power our industry-leading P&C insurance platform. Our team focuses on building secure, scalable, and governed data capabilities that enable AI-infused products and analytics for customers worldwide. We collaborate across Product Development, Security, and Finance to ensure our data platform is robust, reliable, and ready for rapid innovation. In this role, you will design and automate modern data infrastructure on AWS to support Guidewire’s AI-first Product strategy, including Agentic AI Platform initiatives and next-generation analytics experiences. You will lead DataOps practices—CI/CD, observability, and governance—for critical pipelines and data products that power our Claims, Underwriting, and pricing solutions globally. If you are excited by GenAI, large-scale data systems, and building the backbone that helps insurers operate more efficiently, this role puts you at the center of Guidewire’s mission to transform how the world’s P&C insurers do business.

Requirements

Demonstrated ability to embrace AI and apply it to your current role as well as data-driven insights to drive innovation, productivity, and continuous improvement.
5+ years of experience in Data Engineering, Data Operations, or Platform Engineering building and operating cloud data infrastructure.
Deep proficiency with AWS (e.g., S3, EMR, Glue, Lambda, Redshift) and infrastructure-as-code (Terraform strongly preferred; CDK a plus), including designing secure, resilient architectures.
Strong experience with dbt in production (modeling, testing, documentation, deployment) and modern table formats such as Apache Iceberg for large-scale analytics.
Advanced SQL skills (performance tuning, complex joins and window functions) and solid Python experience for automation, orchestration, and data engineering tasks.
Hands-on experience with Apache Spark for large-scale batch or streaming workloads, ideally on AWS EMR or Glue.
Proven track record of building or maintaining CI/CD pipelines (Git-based workflows, automated testing, deployment, and monitoring) for data and analytics workloads.
Strong systems thinking and data modeling skills (e.g., Kimball, Data Vault) and familiarity with integrating RDBMS data via CDC patterns.
Clear, collaborative communication style with the ability to work across product, security, and business stakeholders in a distributed environment.

Nice To Haves

Experience leveraging Claude Code, LLM-based agents, or other agentic AI tools to automate infrastructure provisioning, refactoring, or documentation workflows.
Background working with MLOps platforms such as SageMaker, especially around feature stores and data contracts between data engineering and ML teams.
Experience in a regulated environment (Finance, Insurance, or similar) and familiarity with P&C insurance data concepts (policies, claims, billing, rating, underwriting).
Exposure to Redshift performance tuning, workload management, and cost optimization in enterprise settings.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

Responsibilities

Architect, automate, and maintain production-grade data infrastructure on AWS (e.g., S3, EMR, Glue, Lambda, Redshift) using Terraform or CDK, with a focus on high availability, security, and consistent environments across the SDLC.
Integrate Claude Code and other LLM-based agents into the engineering workflow to accelerate infrastructure provisioning, refactoring, and generation of technical documentation, embedding AI into daily development practices.
Design, build, and optimize CI/CD pipelines that test, deploy, and monitor dbt models and AWS Glue/Spark jobs, ensuring reliable, repeatable delivery of governed data assets.
Implement agentic operations for DataOps—configuring AI agents to triage and perform root-cause analysis of pipeline failures, surface cost-optimization signals, and proactively detect schema drift or data quality regressions.
Engineer scalable, well-governed data pipelines and tables using Apache Iceberg, Airflow (MWAA), and Redshift, emphasizing simplicity, reusability, and clear ownership of data products.
Operationalize security and compliance best practices in a regulated insurance environment, including IAM automation, encryption, audit-ready logging, and alignment with enterprise RBAC/MFA standards.
Partner with Product Strategy, PDO, and data science teams to ensure data platforms and features can support AI-heavy products like the Agentic AI Platform, Claim Summary, and Underwriting Assistant at scale.