Site Reliability Engineer - Backstage

SpotifyNew York, NY
10hHybrid

About The Position

Are you excited by the idea of building fast, reliable, and intelligent infrastructure for a product used by engineering teams around the world? We’re looking for a Site Reliability Engineer to join the Backstage team at Spotify. We are building the next generation of our developer platform—one that doesn't just manage software, but actively helps create and maintain it through AI-native workflows. In 2026, SRE isn't just about uptime; it's about symbiosis. As part of our growing engineering team, you’ll help design, build, and operate the cloud infrastructure behind our external developer portal product and our internal fleet of background coding agents. You’ll collaborate closely with experienced engineers (both human and synthetic) while gaining hands-on experience with real-world scale, observability, and the unique security challenges of an agentic production environment. This is a great opportunity for someone who thrives in a startup-like setting, enjoys working at the forefront of infrastructure-for-AI, and wants to grow their impact by supporting a product that is redefining "developer experience" globally. Backstage is more than just a platform—it’s a ground-breaking force in the developer community. Born out of Spotify’s quest for better developer tooling, Backstage is now a powerhouse driving the future of developer portals across the globe. But we didn't stop at catalogs and templates. Today, Backstage is the command center for AI-native engineering. From industry titans orchestrating thousands of autonomous migrations to lean teams using AI to skyrocket satisfaction, our solutions set the new gold standard. As part of the Backstage team, you’ll shape the developer experience for companies big and small, for our thriving open-source community, and for Spotify itself. You’ll have the unique opportunity to be at the forefront of agentic developer tooling, driving innovation that touches millions.

Requirements

  • Cloud Native & AI Curious: Brings hands-on experience with cloud infrastructure (GCP or AWS) and IaC tools like Terraform, with an interest in LLMs, RAG, or agents in an operational context.
  • Systems Thinker: Understands distributed systems principles and how to operate them reliably at scale, specifically addressing the challenges posed by non-deterministic AI workloads.
  • Polyglot Practitioner: Experienced with at least one modern programming language (e.g., TypeScript, Java, Go, Python) and comfortable navigating codebases where AI-generated PRs are the norm.
  • Quality & Automation: Prioritizes code quality and reliability, looking for ways to build systems that test themselves and improve through automated feedback loops.
  • Growth Mindset: Eager to evolve as an engineer in a landscape where the definition of "operations" changes rapidly.

Nice To Haves

  • Familiarity with open-source projects or building "coding assistant" bots is a plus.

Responsibilities

  • Orchestrate the Fleet: Maintain and improve Portal’s SaaS infrastructure for reliability, security, and scalability. This covers the runtime environments supporting the platform and workflows powered by large language models.
  • Modern Infra-as-Code: Collaborate with senior engineers to build infrastructure on GCP and AWS using Terraform and emerging infrastructure-from-code patterns where agents assist in defining the stack.
  • Support Fullstack Systems: Operate in a modern web stack environment (TypeScript, React, Python). While this isn’t a frontend-heavy role, comfort with debugging fullstack systems and web infrastructure is key.
  • Reliability Engineering: Participate in on-call rotations to ensure systems meet reliability and availability goals, employing AI assistants to accelerate root cause analysis and incident resolution.
  • Collaborate & Innovate: Participate in the planning and delivery of technical projects, defining how infrastructure evolves to support the next wave of generative AI features.

Benefits

  • health insurance
  • six month paid parental leave
  • 401(k) retirement plan
  • monthly meal allowance
  • 23 paid days off
  • 13 paid flexible holidays
  • paid sick leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service