Software Engineer II - Analytics Data Engineering

Klaviyo•Boston, MA

About The Position

Data is the lifeblood of Klaviyo. As a Data Engineer on this team, you will sit at the intersection of infrastructure and intelligence. You won’t just be moving data; you’ll be building the foundations that power our next generation of AI-driven features and enterprise-scale analytics. You will bridge the gap between data engineering and product impact. Your work will involve developing scalable Spark pipelines, tuning our storage and query patterns to ensure low-latency performance for our enterprise customers, and modeling the high-impact datasets that drive Klaviyo's analytics and machine learning engines.

Requirements

The Experience: 2+ years of experience in data engineering or a data-intensive software engineering role. You’ve moved past the "beginner" phase and are comfortable taking a project from a design doc to a production deployment.
Fluent in SQL and Python: You have a solid grasp of SQL for high-performance querying and are comfortable using Python for data manipulation and automation. You focus on writing code that balances speed with reliability.
Distributed Systems Knowledge: You have hands-on experience with Spark (PySpark/SparkSQL) and understand how to tune jobs for performance in a cloud environment (AWS/EMR).
Modeling Intuition: You understand the difference between a "raw table" and a "semantic layer." You’ve worked with modern modeling tools (like dbt) and understand partitioning, schema evolution, and lakehouse concepts (Iceberg/Delta).
Performance Minded: You care about latency. You enjoy the challenge of making a query run faster and understand how to use materialized views and caching effectively.
Collaborative & Curious: You’re an inclusive collaborator who enjoys working with Product and Data Science. You’re excited to experiment with AI tools to make your own engineering workflow more efficient.

Nice To Haves

Experience with Iceberg table maintenance and compaction.
Exposure to Terraform or other Infrastructure-as-Code tools.
A background in Martech or SaaS platforms dealing with high-frequency event data.
Experience building data products that directly power customer-facing UI components and/or support AI/ML features.
Experience building near real-time or streaming pipelines for user-facing analytics or monitoring.
Hands-on work with analytics engineering tools and practices (e.g., dbt, metrics layers, semantic models).
Familiarity with statistical modeling and machine learning.

Responsibilities

Build Production-Grade Foundations: Develop and maintain scalable data pipelines and core tables using PySpark, Airflow, and dbt. You will implement the foundational datasets that power our AI, ML, and Analytics products.
Optimize for Enterprise Performance: Tune Spark jobs and storage patterns to ensure low-latency data retrieval. You will help implement materialized views and efficient partitioning strategies to support high-performance reporting at scale.
Treat Data as a Product: Contribute to the full lifecycle of datasets. This includes defining clear data contracts with upstream teams, writing maintainable code via peer reviews, and ensuring every asset is well-documented and trusted by downstream users.
Drive Operational Excellence: Ensure the reliability of our data engine by monitoring for freshness, volume anomalies, and schema changes. You will be responsible for ensuring that when a customer loads a dashboard, the data is accurate and on time.
Partner Cross-Functionally: Collaborate with Product, Engineering, and AI/ML teams to define consistent metrics that align with business goals. You will act as a bridge to ensure new features land with robust data support.
Innovate with AI: Look for opportunities to put AI at the center of your workflow, whether it is using AI to generate tests, detect data anomalies, or accelerate complex analysis.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume