Staff Site Reliability Engineer — Project Volcano

Kong

About The Position

Kong is building Project Volcano, an internal developer platform purpose-built for Kong's engineering ecosystem. Volcano will provide teams with on-demand preview environments, edge deployments, managed PostgreSQL, auth, realtime, and storage APIs all deeply integrated with Kong products. As the Staff SRE for Volcano, you will be the founding reliability voice for this platform. This role is a strategic initiative driven by the Office of the CTO (OCTO). You will partner directly with engineering leadership to define the platform's reliability posture, build its SRE practice from the ground up, and ensure Volcano can scale to serve all of Kong's customers. This is a high-visibility, high-impact role with direct influence on Kong's next generation developer platform.

Requirements

BS in Computer Science or equivalent; substantial experience at Staff or Principal IC level in SRE/Platform Engineering.
Proven track record building SRE or platform engineering practices for developer-facing platforms or PaaS/SaaS products — ideally at greenfield stage.
Deep Kubernetes expertise: multi-tenant cluster design, networking (CNI, service mesh, ingress), autoscaling, and security hardening.

Responsibilities

Own reliability for Volcano end-to-end: Define and drive SLOs, error budgets, and incident response practices for all Volcano services — edge deployments, managed Postgres, auth, realtime, storage, and the control plane.
Architect the platform's infrastructure: Design and build the multi-region Kubernetes infrastructure, networking, and data plane that powers Volcano's edge deployment pipeline and backend-as-a-service capabilities.
Build the GitOps and CI/CD backbone: Establish deployment automation, canary pipelines, and preview environment provisioning using ArgoCD, Helm, and Terraform/Terragrunt — setting patterns the broader team will follow.
Scale managed data services: Design, operate, and harden multi-tenant PostgreSQL clusters, Redis caching layers, and object storage — with a focus on data isolation, performance, and disaster recovery.
Drive observability from day one: Instrument every Volcano service with meaningful SLIs; build dashboards, alerts, and runbooks using Datadog, Prometheus, and Grafana before services go live, not after incidents.
Lead cross-functional reliability work: Collaborate with the OCTO team, product engineering, and security to bake reliability and compliance into Volcano's architecture — not bolt it on later.
Set SRE culture and standards: Mentor engineers across Volcano's contributing teams on reliability principles; lead postmortems, define on-call practices, and build a blameless engineering culture.
Evaluate and adopt emerging technologies: Given Volcano's greenfield nature, evaluate and make architectural decisions on edge runtimes, serverless compute, vector databases, and AI-native infrastructure components.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume