Senior Site Reliability Engineer

Laravel

1d•Remote

About The Position

At Laravel, we don’t just build tools; we build the foundation that empowers millions of developers to ship their dreams. We are looking for a Senior Site Reliability Engineer to help us scale that mission by ensuring our global infrastructure remains as elegant and reliable as the code we write. If you are energized by the challenge of managing multi-region Kubernetes clusters, building robust observability systems, and solving complex operational puzzles with code, you’ve found your next home. Description of the Role As a Senior Site Reliability Engineer, you will be a founding member of our dedicated SRE function, reporting directly to Florian Beer. This is a high-impact, autonomous role where you will design and implement the systems that power Laravel Cloud, Nightwatch, Forge and Vapor. You will act as a bridge between development and operations, advocating for a blameless culture and shared responsibility for reliability across the entire organization. Your 12-Month Mission Imagine we are all at Laracon in 12 months' time. You are telling the team about your first year, and the impact is undeniable: First 30 Days: You have stabilized our incident response by creating comprehensive, actionable runbooks for our core alerts. Day 60: You have pioneered "observability as code" by migrating our alert rules and dashboards into version control. Day 90: You have established clear, data-driven SLOs for all customer-facing products, giving us a unified language for reliability. Year One: You have transformed our visibility by creating beautiful, insightful dashboards used by the entire company and have significantly reduced manual toil through sophisticated automation.

Requirements

Infrastructure Mastery: Deep experience with Linux system administration and cloud platforms, specifically AWS.
Orchestration & IaC: Proficiency with Kubernetes, Docker, and managing infrastructure via Terraform.
Programming Skills: The ability to solve problems with software and scripting using PHP, Bash, or Go.
Systems Thinking: A "smart and passionate" approach to troubleshooting, with the ability to deconstruct complex systems into triagable components.
Reliability Mindset: Experience with SLO/SLI/SLA definition, capacity planning, and performance tuning.
Soft Skills: A commitment to documentation, cross-team collaboration, and an automation-first mindset.

Nice To Haves

Framework Familiarity: Previous experience working with the Laravel framework or our existing product suite (Cloud, Forge, Vapor, etc.) is highly preferred.
Advanced Observability: Experience with Prometheus and Grafana Mimir for metrics storage and alerting.
Cost optimization: Specialized knowledge in managing and optimizing resource usage and cloud costs.

Responsibilities

Architect Reliability: Establish SRE as a core function at Laravel, building the fundamentals from the ground up.
System Design: Design, build, and maintain multi-region Kubernetes infrastructure and global distributed systems.
Automation: Solve operational challenges through software, reducing manual intervention (toil) for our product teams.
Observability: Design and implement monitoring, logging, and alerting systems using tools like Prometheus, Grafana, and Loki.
Collaboration: Partner with product leads and SecOps to make reliability a shared responsibility.
Incident Response: Lead incident reviews and postmortems in a strictly blameless environment to foster continuous learning.