Senior Site Reliability Engineer

Runlayer

83d

About The Position

AI is transforming how every company operates, but most enterprises are stuck. They want to move fast with AI agents, tools, and workflows, but they can't do it safely. We're fixing that. Our team built AI Actions for OpenAI, shipped Zapier Agents to millions of users, and launched the first remote MCP server with Anthropic. The co-creator of MCP is on our cap table. We helped establish the protocol, and now we're building the platform enterprises need to actually use it. Runlayer is one platform for MCPs, Skills, and Agents. Purpose-built security, fine-grained governance, and complete observability so organizations can push AI forward across the entire company without the risk. We raised $11M from Khosla Ventures and Felicis, and customers include Gusto, Instacart, and Opendoor. We're a team of 25, mostly engineers, shipping fast. If you want to work at the center of how AI gets things done, this is the moment. As our Site Reliability Engineer, you'll own the reliability, performance, and scalability of Runlayer's infrastructure as we grow to serve enterprise customers across cloud and on-prem environments. Why You'll Thrive Here Impact: Build the infrastructure foundation for the enterprise MCP platform, directly enabling AI adoption at scale Excellence: Work closely with founders and a small, senior engineering team shipping fast in a high-growth environment Ownership: Own reliability end-to-end, from database performance to incident response to CI/CD pipelines

Requirements

Strong AWS experience, particularly ECS, Aurora, and CloudWatch
GCP experience as we expand cross-cloud
Kubernetes and container orchestration expertise
DBRE experience with database performance tuning
CI/CD pipeline ownership and incident response experience
Background at a B2B SaaS company serving enterprise customers, ideally in infrastructure

Nice To Haves

Experience deploying and supporting on-prem or hybrid environments
Python backend familiarity (our platform is Python-based)
Experience at an early-stage or high-growth company

Responsibilities

Own reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP
Manage and optimize Kubernetes clusters and container orchestration
Drive database reliability engineering, including performance tuning and scaling
Build and maintain CI/CD pipelines for rapid, safe deployments
Run incident response and on-call rotations
Partner with product engineers to design scalable, resilient systems

Benefits

Competitive salary and equity — compensation that reflects your expertise and customer-facing responsibilities.
Paid time off — 4 weeks paid vacation, paid sick leave, and paid parental leave.
Professional development — budget for conferences, courses, and certifications in AI, enterprise software, and customer success.
Top-tier equipment — your choice of laptop and accessories to create your ideal work environment.
Health benefits — comprehensive health, dental, and vision coverage.
Customer interaction opportunities — work directly with innovative companies and see the immediate impact of your work.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume