Senior Software Engineer

KlaviyoBoston, MA
Hybrid

About The Position

As the Senior Software Engineer for Product Recommendations, you will be a key contributor in building the machine learning–powered systems that decide which products to show to whom and when across all channels powered by our platform. This hands-on backend role focuses on converting billions of behavioral events into personalized product recommendations that drive revenue for merchants. You will define technical direction, build, and operate services and data pipelines end to end, from data ingestion and feature generation to ranking models and APIs. Lead the design, architecture, and operation of backend services that power product recommendations across Klaviyo experiences (email, SMS, KAgent, onsite, etc.), upholding standards for reliability, performance, and clear APIs. Architect and maintain robust, large-scale data processing pipelines (e.g., using Apache Spark or similar frameworks) that transform raw events and catalog data into high-quality features and inputs for recommendation models, ensuring data quality and lineage. Collaborate closely with ML engineers and product stakeholders to strategically productionize recommendation models —defining high-level interfaces, robust feature contracts, and advanced deployment patterns for batch and/or real-time inference systems. Drive the development of ML/AI systems such as vector search that power recommendation, semantic search, and sophisticated agentic use cases. Implement and evolve data and service observability (metrics, logging, tracing, dashboards) to proactively ensure recommendations are correct, fast, and highly available for all customers. Contribute to and mentor others on shared data frameworks, libraries, and architectural patterns to accelerate the development of new recommendation use cases and iteration velocity across the team. Work with Product to break down projects into clear milestones, balancing the need for rapid experimentation with technical soundness and long-term maintainability. Lead data-driven decision making and A/B testing efforts —ensuring recommendation systems are instrumented with the right metrics, and independently interpreting results to guide future product and engineering iterations. Participate in on-call and incident response for the systems you own, driving major post-incident follow-ups that substantially improve the resilience and operability of our recommendation stack. Champion and drive the transformation of engineering workflows by integrating AI from the ground up—for example, using AI to accelerate development, automate complex tests, or build smarter monitoring and debugging tools. Share knowledge, mentor junior/mid-level engineers, and define best practices on working with large-scale data frameworks, distributed systems, and integrating ML into production systems.

Requirements

  • 5+ years of software engineering experience, with experience building and operating mission-critical backend services and systems in a production environment.
  • Experience in backend and distributed systems at scale; you have a proven track record working on high-throughput, highly available services and are an skilled in optimizing for latency, reliability, and operability.
  • Proficient in Python and open to working in other languages
  • Comfortable with cloud-native architectures (AWS preferred) and container orchestration (e.g., Kubernetes); you manage infrastructure and CI/CD pipelines as a core part of your development process.
  • Experience in data-driven decision making and A/B testing —you can define how to instrument experiments, read and interpret results, and ensure learnings are folded back into system design.
  • Comfortable designing and querying data models in relational, analytical, and NoSQL datastores (e.g., Postgres, MySQL, data warehouses, Redis, vector databases).
  • Feel at home with modern DevOps practices (CI/CD, monitoring, alerting) and how to apply them to architect large-scale data and recommendation systems.
  • Track record of owning multi-component projects end-to-end —from initial technical design and implementation through rollout, monitoring, and sustained iteration.
  • Excellent technical collaborator and communicator: you can clearly articulate complex technical trade-offs to both technical peers and non-technical partners, and you work effectively to drive alignment across ML Engineers, Software Engineers, PMs, and other teams.
  • You are a self-starter who has actively experimented with AI in work or personal projects and are excited to responsibly explore and define new AI tools and workflows to enhance team productivity and system intelligence.

Nice To Haves

  • Previous experience working on product recommendation systems or adjacent ML-powered features (ranking, personalization, search, or similar).
  • Experience with big data frameworks such as Apache Spark (or similar technologies like Flink, Beam, etc.) for architecting and building complex batch or streaming pipelines.
  • Experience in AI/ML systems and products, such as integrating models into production systems, building features powered by ML, or contributing to the ML infrastructure.
  • Experience training and iterating on machine learning models (e.g., for ranking, prediction, or personalization).
  • Experience with ML and distributed compute frameworks such as Ray or similar tools.
  • Experience partnering with data science or ML teams to productionize models (designing feature stores, ensuring offline/online parity, advanced model deployment and monitoring).
  • Background in e-commerce, marketing tech, or consumer personalization products.

Responsibilities

  • Define technical direction, build, and operate services and data pipelines end to end, from data ingestion and feature generation to ranking models and APIs.
  • Lead the design, architecture, and operation of backend services that power product recommendations across Klaviyo experiences (email, SMS, KAgent, onsite, etc.), upholding standards for reliability, performance, and clear APIs.
  • Architect and maintain robust, large-scale data processing pipelines (e.g., using Apache Spark or similar frameworks) that transform raw events and catalog data into high-quality features and inputs for recommendation models, ensuring data quality and lineage.
  • Collaborate closely with ML engineers and product stakeholders to strategically productionize recommendation models —defining high-level interfaces, robust feature contracts, and advanced deployment patterns for batch and/or real-time inference systems.
  • Drive the development of ML/AI systems such as vector search that power recommendation, semantic search, and sophisticated agentic use cases.
  • Implement and evolve data and service observability (metrics, logging, tracing, dashboards) to proactively ensure recommendations are correct, fast, and highly available for all customers.
  • Contribute to and mentor others on shared data frameworks, libraries, and architectural patterns to accelerate the development of new recommendation use cases and iteration velocity across the team.
  • Work with Product to break down projects into clear milestones, balancing the need for rapid experimentation with technical soundness and long-term maintainability.
  • Lead data-driven decision making and A/B testing efforts —ensuring recommendation systems are instrumented with the right metrics, and independently interpreting results to guide future product and engineering iterations.
  • Participate in on-call and incident response for the systems you own, driving major post-incident follow-ups that substantially improve the resilience and operability of our recommendation stack.
  • Champion and drive the transformation of engineering workflows by integrating AI from the ground up—for example, using AI to accelerate development, automate complex tests, or build smarter monitoring and debugging tools.
  • Share knowledge, mentor junior/mid-level engineers, and define best practices on working with large-scale data frameworks, distributed systems, and integrating ML into production systems.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service