About The Position

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny. Team Overview: The mission of the Asynchronous Processing team is to design, build, and operate Klaviyo’s high-scale, event-driven backbone so that every product team can reliably move data through our systems with confidence, no matter the volume. By providing opinionated, self-service platforms for queueing and background processing–grounded in technologies like Golang, Python, Apache Pulsar, Kafka, SQS, AWS, and Kubernetes–we enable engineering teams to focus on customer impact while we ensure their workloads are fast, observable, cost-efficient, and resilient by default. As a Senior Platform Engineer on the Asynchronous Processing team, you will architect, build, and operate the large-scale, event-driven backbone that powers Klaviyo, crafting paved-path primitives on top of Golang, Python, Apache Pulsar, Kafka, and SQS running in AWS and Kubernetes. You’ll design and evolve resilient, multi-tenant queueing and processing platforms that safely handle massive spikes in traffic, process millions of messages per second with low latency, and provide simple, self-service APIs and tooling that enable product teams to move fast and with confidence. From core data models and routing patterns to observability, autoscaling, and failure isolation, you’ll own systems that must be rock solid at scale, partnering closely with engineers across the company to turn complex distributed-systems problems into intuitive, reliable building blocks. This role is for senior engineers who love taming large-scale distributed systems and turning real-world asynchronous workloads into reliable, scalable platforms. You’ll work on highly visible infrastructure at the heart of Klaviyo’s event-driven backbone, directly shaping how thousands of Klaviyos build, ship, and operate products every day.

Requirements

  • You are passionate about building software effectively and for the long term, balancing technical quality, velocity, and business impact.
  • BA or BS Degree in Computer Science, related field, or equivalent experience.
  • You typically bring 6+ years of hands-on software development experience building and operating highly available, full-stack SaaS products at scale, or have demonstrated equivalent proficiency and impact.
  • You are independently responsible for the full lifecycle of complex projects or features, including discovery, technical design, implementation, rollout, and ongoing maintenance. You provide clear technical direction for others involved.
  • You own the operational health of the systems you build, including performance, reliability, and observability. You are comfortable defining and upholding SLOs, participating in on-call, and driving follow-through on incidents and RCAs.
  • Familiarity with cloud infrastructure (AWS preferred), infrastructure-as-code (Terraform), and containerized environments (Kubernetes), including how to design services that run reliably at scale in those environments.
  • Expertise and hands-on experience with asynchronous processing and queueing systems like SQS, Kafka, or Apache Pulsar.
  • Ability to handle yourself and complex systems in outage situations and to drive failures to root cause analysis and prevention of future issues.
  • You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.
  • You question convention and proactively look for ways to improve, whether that’s workflows, architecture, tooling, or team processes. You are intrinsically motivated to drive incremental and step-change improvements that deliver value to the business.
  • You are an excellent communicator and collaborator. You lead technical discussions at the project and product-area level, write clear technical design documents and RFCs, and keep stakeholders aligned on progress, risks, and trade-offs.
  • You mentor and support other engineers, offering thoughtful feedback in design and code reviews, helping refine specifications, and investing in the growth of more junior engineers on the team.
  • You enjoy working on small, autonomous, agile teams, shipping early and often, pairing with product managers, business stakeholders, and other engineers to craft better software.

Nice To Haves

  • Fundamental understanding of Linux and all layers of the networking stack; you should be confident administering and debugging production Linux systems.
  • Practical experience building high availability systems at scale with Apache Pulsar.

Responsibilities

  • Build a deep understanding of engineering needs across the organization, guiding the design and development of appropriate platform primitives in queueing that align with the platform's vision and practically empower product teams.
  • Design, develop, and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo's asynchronous and queueing services.
  • Design and develop systems and processes that enable highly available & scalable systems, with a focus on asynchronous processing.
  • Leverage technology such as Python, Golang, AWS, and Kubernetes to advance Klaviyo's platform, with a deep focus on Apache Pulsar, SQS, and Kafka.
  • Champion best practices by actively collaborating with other teams in a culture that values technical design review.
  • Mentor and pair with other Klaviyo engineers to build better software by focusing on performance, self-healing systems, configuration as code, and defensive programming.
  • Participate in periodic on-call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service