About The Position

Data is the lifeblood of Klaviyo. As a Data Engineer on this team, you will sit at the intersection of infrastructure and intelligence. You won’t just be moving data; you’ll be building the foundations that power our next generation of AI-driven features and enterprise-scale analytics. You will bridge the gap between data engineering and product impact. Your work will involve developing scalable Spark pipelines, tuning our storage and query patterns to ensure low-latency performance for our enterprise customers, and modeling the high-impact datasets that drive Klaviyo's analytics and machine learning engines.

Requirements

  • The Experience: 2+ years of experience in data engineering or a data-intensive software engineering role. You’ve moved past the "beginner" phase and are comfortable taking a project from a design doc to a production deployment.
  • Fluent in SQL and Python: You have a solid grasp of SQL for high-performance querying and are comfortable using Python for data manipulation and automation. You focus on writing code that balances speed with reliability.
  • Distributed Systems Knowledge: You have hands-on experience with Spark (PySpark/SparkSQL) and understand how to tune jobs for performance in a cloud environment (AWS/EMR).
  • Modeling Intuition: You understand the difference between a "raw table" and a "semantic layer." You’ve worked with modern modeling tools (like dbt) and understand partitioning, schema evolution, and lakehouse concepts (Iceberg/Delta).
  • Performance Minded: You care about latency. You enjoy the challenge of making a query run faster and understand how to use materialized views and caching effectively.
  • Collaborative & Curious: You’re an inclusive collaborator who enjoys working with Product and Data Science. You’re excited to experiment with AI tools to make your own engineering workflow more efficient.

Nice To Haves

  • Experience with Iceberg table maintenance and compaction.
  • Exposure to Terraform or other Infrastructure-as-Code tools.
  • A background in Martech or SaaS platforms dealing with high-frequency event data.
  • Experience building data products that directly power customer-facing UI components and/or support AI/ML features.
  • Experience building near real-time or streaming pipelines for user-facing analytics or monitoring.
  • Hands-on work with analytics engineering tools and practices (e.g., dbt, metrics layers, semantic models).
  • Familiarity with statistical modeling and machine learning.

Responsibilities

  • Build Production-Grade Foundations: Develop and maintain scalable data pipelines and core tables using PySpark, Airflow, and dbt. You will implement the foundational datasets that power our AI, ML, and Analytics products.
  • Optimize for Enterprise Performance: Tune Spark jobs and storage patterns to ensure low-latency data retrieval. You will help implement materialized views and efficient partitioning strategies to support high-performance reporting at scale.
  • Treat Data as a Product: Contribute to the full lifecycle of datasets. This includes defining clear data contracts with upstream teams, writing maintainable code via peer reviews, and ensuring every asset is well-documented and trusted by downstream users.
  • Drive Operational Excellence: Ensure the reliability of our data engine by monitoring for freshness, volume anomalies, and schema changes. You will be responsible for ensuring that when a customer loads a dashboard, the data is accurate and on time.
  • Partner Cross-Functionally: Collaborate with Product, Engineering, and AI/ML teams to define consistent metrics that align with business goals. You will act as a bridge to ensure new features land with robust data support.
  • Innovate with AI: Look for opportunities to put AI at the center of your workflow, whether it is using AI to generate tests, detect data anomalies, or accelerate complex analysis.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service