Senior Software Engineer, Developer Platform

Decagon•San Francisco, CA

5h•Onsite

About The Position

Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences. Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel. We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others. We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team. The Infrastructure team builds and operates the foundations that power Decagon: networking, data, ML serving, developer platform, and real‑time voice. We partner closely with product, data, and ML to deliver high‑scale, low‑latency systems with clear SLOs and great developer ergonomics. We’re looking for a Senior Software Engineer to help build and evolve our internal developer platform—everything from CI/CD and release automation to observability standards, platform tooling, and developer workflows that remove friction. This role is for someone who loves making other engineers faster: reducing build times, eliminating flaky tests, creating paved roads for service creation/deployment, and raising the bar on operability by default. Roles like this often combine “builder” energy with strong empathy for how engineers actually work.

Requirements

4+ years building production software, with meaningful experience in platform / devtools / infrastructure (or adjacent SRE/release engineering).
Strong coding ability in at least one systems/productivity language (e.g., Python, TypeScript/JS), and comfort building developer-facing tooling (CLIs, libraries, automation).
Hands-on experience with CI/CD systems and designing pipelines that are scalable and reusable across many repos/services.
Practical experience with observability in production systems (instrumentation, alerting, dashboards, incident response).
Comfort with containers and modern cloud infrastructure (e.g., Docker/Kubernetes and related tooling).
A track record of improving developer experience through measurable outcomes (faster builds, fewer flakes, safer deploys, fewer incidents).
Strong cross-team collaboration and communication—especially writing clear docs and driving adoption.

Nice To Haves

Experience with monorepos and build systems and/or large-scale CI performance work.
Experience building internal platforms: service templates, paved-road deployment, self-serve environments, developer portals.
Infrastructure-as-code experience (e.g., Terraform) and a security-minded approach to supply chain (provenance, secrets, least privilege).
Experience applying AI-assisted tooling to make engineers dramatically more effective.

Responsibilities

Developer productivity & platform tooling
Identify workflow bottlenecks (build/test/release/local dev) and build tools that measurably reduce toil.
Create and maintain “golden paths” like service templates, CLIs, libraries, and automation that teams rely on.
CI/CD & release engineering
Design reusable CI pipelines and deployment workflows that are fast, safe, and easy to adopt across teams.
Improve reliability of builds and tests (flake reduction, hermeticity, caching) and drive down cycle time.
Support progressive delivery patterns (canary / blue-green) and safe rollback mechanisms.
Observability & operational excellence
Establish shared observability primitives (metrics/logs/traces), standards, and libraries so services are production-ready by default.
Partner with product engineers to improve operability: SLOs, alerting hygiene, dashboards, incident learnings.
Infrastructure foundations
Build and improve core platform capabilities that make it easy to run and scale services.
Ownership & reliability
Own the systems you build end-to-end and help keep them healthy in production, improving reliability over time.