Director of Engineering, Infrastructure

Klaviyo•Boston, MA

19h•$244,000 - $366,000

About The Position

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny. Klaviyo’s mission is to empower businesses to independently drive their growth, and the Engineering Department's contribution to this mission is crucial. As Director of Production Infrastructure, you'll helm the creation and management of a high-performance platform designed to support the rapid innovation demanded by our R&D teams. This role is all about defining the pillars of our infrastructure, compute, storage, networking, observability, and setting a robust set of principles that guide their use. In this position, you'll be entrusted with the responsibility of developing and maintaining platform primitives that empower our engineering teams to bring ideas to life seamlessly. Collaborating with industry leaders across engineering, security, and finance, your decisions will shape the infrastructure blueprint that underpins our scalable, secure, and cost-effective operations. As a leader, your mission is to foster a culture of ownership, innovation, and productivity while steering teams toward achieving critical reliability and performance metrics. Your role will span across defining clear service contracts, instituting capacity plans, and honing our developer enablement strategy to reduce friction and enhance developer velocity.

Requirements

Over 10 years of experience in infrastructure, SRE, platform engineering, or security engineering, with at least 5 years managing managers and senior ICs, demonstrating strong leadership and team-building skills.
Leadership style fosters inclusivity and empowerment, setting high standards that inspire your teams to achieve ambitious goals aligned with Klaviyo's mission.
Deep understanding of SRE principles, with proven expertise in designing effective SLOs and SLIs, managing incidents, and ensuring capacity planning and operational continuity.
Experience with LiveSite (the enterprise web platform) and understand the infrastructure requirements, integration patterns, and operational demands that come with supporting enterprise-grade web properties at scale.
Own the live site, bring urgency, clear judgment, and a structured approach to production incidents, and build teams that treat uptime and reliability as first-order concerns, not afterthoughts. Proven track record of standing up or strengthening LiveSite culture inside engineering orgs.
Excel at simplifying complexity and creating logical clarity, using strong communication skills to drive organizational changes across product, data, and security domains.
Decision-making is driven by outcomes, and comfortable prioritizing initiatives that deliver the most significant business impact, even if it means narrowing the scope.
Hands-on experience with AI makes you AI-curious, and eager to leverage it to drive smarter, more efficient infrastructure operations.
Technical expertise is both broad and deep, command public cloud services, container orchestration, service meshing, data storage, and observability at a systems level, and remain hands-on when it counts, able to roll up your sleeves and debug a complex distributed-systems problem alongside your team.
Build and nurture a culture of ownership and continuous improvement, encouraging your teams to innovate and excel while maintaining a strong focus on customer value.
Keen ability to balance strategic thinking with tactical execution, ensuring infrastructure solutions not only meet current needs but are future-proof and scalable.
An org builder, know how to scale a team thoughtfully, develop high-potential engineers into managers, and create the conditions for leaders to emerge and grow.
Raise the bar relentlessly, set high standards for engineering excellence, hold yourself and your teams accountable to those standards, and invest continuously in growing the people around you.
Committed to fostering a learning environment, continually seek ways to improve team skills, processes, and technologies to align with Klaviyo's growth and innovation objectives.
Bring a security-first mindset to infrastructure decisions, with experience partnering closely with security engineering teams to build platforms that are secure by design and operationally hardened.

Nice To Haves

Previous experience in transforming internal platforms into productized services, complete with SLAs and developer experience-focused roadmaps.
Background in architecting data-centric and event-driven systems at a large scale, particularly within a high-growth environment.
Established partnerships with centralized data platforms, defining clean ownership boundaries and integrating efficiently with existing workflows.
Proven track record of optimizing both cost-to-serve and reliability metrics in a scaling SaaS company, driving significant impact on the bottom line.
Familiarity with the challenges and opportunities characteristic of a high-growth SaaS milieu, with a focus on leveraging those to drive platform innovation and efficiency.

Responsibilities

Lead the definition of platform primitives such as compute runtimes, storage options, and service networking, ensuring they are scalable, secure, and aligned with Klaviyo's standards for operational excellence.
Create and disseminate golden paths and decision trees that simplify the technological choices for R&D teams, enhancing consistency and self-sufficiency across engineering efforts.
Drive initiatives that enhance the reliability of production systems, focusing on incident prevention, transparent response protocols, and proactive capacity planning.
Coordinate with product teams to identify and eliminate infrastructure bottlenecks, aiding in improving the time-to-market for new services and increasing developer satisfaction.
Establish frameworks for cost-effective infrastructure management, balancing financial discipline with flexibility and efficiency to maximize value delivery.
Mentor and develop high-performing teams, fostering a culture of inclusivity and ownership, while setting clear, impactful goals that align with business priorities.
Collaborate with cross-functional partners to manage platform investments, clarify ownership, and safely implement infrastructure changes that drive strategic outcomes.
Track and report critical performance metrics, such as system reliability, developer productivity, and infrastructure costs, enabling data-driven decision-making and accountability.
Optimize the use of AI to enhance infrastructure management and development processes, pioneering innovative workflows that keep Klaviyo at the forefront of technological advancement.
Champion operational readiness by establishing robust SLAs and SLIs, ensuring all infrastructure components meet defined performance thresholds conforming to Klaviyo's quality standards.
Facilitate a culture of continuous learning and experimentation with AI tools, deploying enhancements that intelligently streamline engineering workflows.
Lead a disciplined approach to incident management and postmortems, establishing a blameless culture of learning and innovation to minimize future disruptions.