Principal Site Reliability Engineer

PowerPlan, IncAtlanta, GA
2dHybrid

About The Position

This is a principal-level individual contributor role at the heart of our cloud platform’s reliability, scalability, and operational maturity. You will work hands-on across AWS and Azure environments, solving complex production problems while systematically eliminating the manual toil that creates them. The role offers significant autonomy, deep technical impact, and the opportunity to shape how reliability engineering is practiced across the organization. COMPANY PowerPlan operates a growing SaaS platform supporting enterprise customers with mission-critical workloads. We run complex, multi-cloud environments and value engineers who take ownership, think in systems, and build solutions that scale. Our culture emphasizes operational excellence, blameless learning, and collaboration across Engineering, Support, Professional Services, and Product teams.

Requirements

  • Deep hands-on experience operating production systems in AWS and Azure environments
  • Strong automation skills using Python and PowerShell in operational contexts
  • Proven ability to identify repetitive operational work and eliminate it through automation
  • Experience leading incident response and blameless post-incident reviews
  • Strong observability expertise, particularly with Grafana and SLI/SLO-driven monitoring
  • Ability to influence engineering practices without formal authority
  • Clear written and verbal communication skills across technical and non-technical audiences

Responsibilities

  • Platform Familiarity Through Escalations & Early Automation (First 90 Days)
  • Eliminate Top Sources of Operational Toil (3–6 Months)
  • Mature Incident Response & Post‑Incident Learning (6–9 Months)
  • Deliver a Mature, SLO‑Aligned Observability Platform (9–12 Months)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service