Senior Site Reliability Engineer

OfficeSpace Software

About The Position

You own the performance, reliability, and cost efficiency of OfficeSpace’s production platform at scale. As a Senior Site Reliability Engineer, you shape how our systems run—fast, resilient, and predictable—while leading the shift from manual operations to AI-assisted reliability engineering. We provide the platform. You make it perform.

Requirements

  • 7+ years operating and evolving large-scale production systems.
  • Deep Linux systems expertise with hands-on performance tuning across CPU, memory, disk, and networking.
  • Strong Python skills for automation, tooling, and AI-assisted systems workflows.
  • Production experience with Ruby/Rails ecosystems, including Puma and Sidekiq.
  • Proven ability to diagnose and resolve complex database performance issues (MySQL/MariaDB or PostgreSQL).
  • Advanced Kubernetes experience—workload sizing, scheduling, and multi-tenant operations.
  • Infrastructure-as-code mastery using Terraform and Terragrunt.
  • Experience with configuration management tools such as Puppet or Ansible.
  • Strong observability instincts across metrics, logs, and traces using tools like Prometheus, Grafana, Datadog, or ELK.
  • AI fluency—comfortable supervising AI agents for analysis, testing, and reporting, and validating their outputs.
  • A builder mindset. You move fast, take ownership, and raise standards.

Nice To Haves

  • Scaling and refactoring monolithic applications under real production load
  • Extracting databases or stateful components from monoliths
  • Apache and Nginx tuning at scale
  • Redis performance optimization and operational management
  • CI/CD systems and GitOps workflows, including ArgoCD
  • Cloud cost optimization and FinOps-aligned operational practices

Responsibilities

  • Drive measurable improvements in latency, throughput, and availability across a large-scale production environment.
  • Own system performance—from Linux internals to Kubernetes scheduling—and eliminate bottlenecks before customers feel them.
  • Define and enforce SLIs, SLOs, and error budgets that balance speed, reliability, and growth.
  • Partner with application engineers to profile code paths, improve execution efficiency, and harden services under real load.
  • Lead database performance optimization across queries, indexing, replication, and workload isolation.
  • Design and oversee AI-assisted load testing, stress testing, and capacity planning workflows.
  • Guide the migration from monolithic deployments to multi-tenant Kubernetes platforms.
  • Reduce infrastructure spend through architectural decisions, right-sizing, and intelligent scaling strategies.
  • Build and supervise automation for infrastructure provisioning, configuration management, and observability.
  • Set clear operational standards for reliability, performance, and incident response—and raise the bar for how we run production.

Benefits

  • Competitive Benefits and Rewards: OfficeSpace offers comprehensive and competitive benefits packages globally, designed to support our team’s health, well-being, and financial security. We invest in our people so they can excel.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service