About The Position

AI is transforming how every company operates, but most enterprises are stuck. They want to move fast with AI agents, tools, and workflows, but they can't do it safely. We're fixing that. Our team built AI Actions for OpenAI, shipped Zapier Agents to millions of users, and launched the first remote MCP server with Anthropic. The co-creator of MCP is on our cap table. We helped establish the protocol, and now we're building the platform enterprises need to actually use it. Runlayer is one platform for MCPs, Skills, and Agents. Purpose-built security, fine-grained governance, and complete observability so organizations can push AI forward across the entire company without the risk. We raised $11M from Khosla Ventures and Felicis, and customers include Gusto, Instacart, and Opendoor. We're a team of 25, mostly engineers, shipping fast. If you want to work at the center of how AI gets things done, this is the moment. As our Site Reliability Engineer, you'll own the reliability, performance, and scalability of Runlayer's infrastructure as we grow to serve enterprise customers across cloud and on-prem environments. Why You'll Thrive Here Impact: Build the infrastructure foundation for the enterprise MCP platform, directly enabling AI adoption at scale Excellence: Work closely with founders and a small, senior engineering team shipping fast in a high-growth environment Ownership: Own reliability end-to-end, from database performance to incident response to CI/CD pipelines

Requirements

  • Strong AWS experience, particularly ECS, Aurora, and CloudWatch
  • GCP experience as we expand cross-cloud
  • Kubernetes and container orchestration expertise
  • DBRE experience with database performance tuning
  • CI/CD pipeline ownership and incident response experience
  • Background at a B2B SaaS company serving enterprise customers, ideally in infrastructure

Nice To Haves

  • Experience deploying and supporting on-prem or hybrid environments
  • Python backend familiarity (our platform is Python-based)
  • Experience at an early-stage or high-growth company

Responsibilities

  • Own reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP
  • Manage and optimize Kubernetes clusters and container orchestration
  • Drive database reliability engineering, including performance tuning and scaling
  • Build and maintain CI/CD pipelines for rapid, safe deployments
  • Run incident response and on-call rotations
  • Partner with product engineers to design scalable, resilient systems

Benefits

  • Competitive salary and equity — compensation that reflects your expertise and customer-facing responsibilities.
  • Paid time off — 4 weeks paid vacation, paid sick leave, and paid parental leave.
  • Professional development — budget for conferences, courses, and certifications in AI, enterprise software, and customer success.
  • Top-tier equipment — your choice of laptop and accessories to create your ideal work environment.
  • Health benefits — comprehensive health, dental, and vision coverage.
  • Customer interaction opportunities — work directly with innovative companies and see the immediate impact of your work.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service