Principal Engineer, Core Infrastructure

KlaviyoBoston, MA
Remote

About The Position

As a hands-on principal for compute, networking, storage, runtimes (e.g., Kubernetes), CI/CD, and observability, you’ll architect the service platform that lets teams ship fast and safely. This is an individual contributor (IC) role with no direct reports. You will lead via design, code, and incident excellence, setting technical standards and SLOs for platform services.

Requirements

  • 10+ years building and operating cloud platforms (compute, networking, storage, runtimes like Kubernetes), with a track record of multi-region HA and SLO rigor.
  • Deep expertise in Kubernetes, service mesh, Terraform/IaC, CI/CD, and production observability; you ship golden paths and guardrails that lift the whole org.
  • Experience with databases and storage systems, including SQL and NoSQL databases, and object, block, or file storage platforms.
  • Experience bringing AI into platform engineering—from copilot-assisted workflows and intelligent test generation to AIOps for incident triage, anomaly detection, and runbook automation—with clear security and cost boundaries.
  • Lead via design reviews, incident excellence, and SLO/error-budget tradeoffs communicated in business terms.
  • Hands-on with AI tools and helping teams adopt them responsibly.

Nice To Haves

  • Achieve >=99.95% SLOs for core services.
  • Achieve 25–50% faster build/deploy times.
  • Reduce developer-reported friction.
  • Integrate approved AI tooling into IDE/CI/CD with repo policies and auditability.
  • Achieve >=70% MAU among eligible engineers for AI tooling.
  • Reduce MTTR by 20–30% via AI-assisted triage.
  • Decrease flaky-test rate through targeted, AI-suggested fixes.
  • Implement cost, security, and compliance controls codified as IaC modules and enforced in paved roads.
  • Experience with enterprise governance, including compliance and audit requirements.
  • Familiarity with GDPR and data privacy considerations in large-scale, production environments.

Responsibilities

  • Architect and evolve the Kubernetes platform, service mesh, networking, storage, and CI/CD pipelines; ship golden paths and IaC modules.
  • Define platform SLOs; use error budgets to guide reliability vs. velocity trade-offs; drive incident learning and readiness reviews.
  • Improve developer velocity (build/deploy times, flaky tests, local dev ergonomics) with measurable results.
  • Lead capacity planning and commitments; build guardrails for cost, security, and compliance with Security/FinOps partners.
  • Write high-impact code, automation, and tooling; mentor across teams and raise the bar on operational excellence.
  • Embed AI in the developer experience—from code generation to observability and incident response—so teams ship faster and safer by default.

Benefits

  • Comprehensive range of health, welfare, and wellbeing benefits
  • Participation in the company’s annual cash bonus plan
  • Equity
  • Sign-on payments
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service