Cluster & Systems Capacity Engineer

Backblaze External Website
$123,000 - $175,000Remote

About The Position

Backblaze is seeking a highly analytical, systems-oriented Cluster & Systems Capacity Engineer to drive the planning, forecasting, deployment, and optimization of hardware infrastructure across our global cloud storage platform. This role ensures that Backblaze’s storage clusters, compute systems, and network infrastructure scale reliably, cost-efficiently, and ahead of demand. You will build and maintain predictive models, ensure consistent supply and demand alignment, and partner cross-functionally to inform strategic investment and deployment decisions. This is a high-impact role within Cloud Operations, directly contributing to service availability, durability, performance, margin optimization, and long-term platform scalability.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Data Science, Information Systems, Statistics or a related, technical field (or equivalent experience).
  • 3-6+ years of experience in Site Reliability Engineering, Infrastructure Capacity Planning, Systems/Infrastructure Engineering, Production Engineering, Data Center Operations or similar Cloud Operations role
  • Familiarity and experience working with Cloud Storage infrastructure, particularly highly-available, large-scale distributed systems supporting large amounts of data with high throughput and complex performance requirements
  • Background in capacity modeling, performance analysis, scenario modeling, and/or infrastructure cost optimization, with an ability to quantify ideas within financial frameworks and forecasts.
  • Proficiency in database and data analysis tools (preferably Snowflake, Metabase, Grafana, Python, SQL, Prometheus, Victoria Metrics, and Excel/Google Sheets)
  • Demonstrated deep, creative, and logical thinking complimented by a strong data analysis skillset
  • Excellent communication and documentation skills, with the ability to share knowledge and explain concepts accurately and concisely
  • Desire to work on a highly-autonomous team that cares deeply about quality, cost, and the customer experience

Responsibilities

  • Develop and maintain short, medium, and long-term capacity demand and hardware deployment forecasts across storage, compute, and network domains within the platform
  • Build predictive models that translate business demand signals into infrastructure requirements using historical utilization, growth trends, product sales plans, hardware lifecycle roadmaps, and other key business inputs
  • Partner with Infrastructure, Production, and Network Engineering teams to align capacity plans with system design and scaling initiatives
  • Develop and automate forecasting pipelines, simulation calculators and tools, and capacity dashboards to improve data quality, reduce manual analysis, and provide stakeholders clear visibility into platform usage and cluster health metrics
  • Monitor and analyze cluster and system-level utilization and performance across CPU, memory, IOPS, and network resources
  • Adjust deployment plans and recommended configurations in real-time to maintain adequate headroom and system stability in support of delivering a world-class customer experience
  • Partner with service and platform owners to develop headroom and live buffer policies, optimize hardware BoMs, leverage virtualized orchestration, and reduce product cost
  • Work in lockstep with Operations and Finance peers to align capacity plans and hardware requirements with capital budgets, cost targets, and financial outcomes
  • Support strategic optimization initiatives across infrastructure investments, engineering development, and operations processes, contributing to long-term infrastructure strategy and capital planning
  • Lead efforts to evaluate, procure, and provision requests for new or additional hardware, working with Systems and Network Engineering, SRE, NOC, and Data Center Operations teams to identify and deliver optimal solutions
  • Maintain alignment with Product and Sales to support customer onboarding, growth, and demand variability
  • Communicate complex capacity and infrastructure insights clearly to technical and non-technical stakeholders

Benefits

  • Healthcare for family, including dental and vision
  • Competitive compensation and 401K
  • RSU grants for full-time employees
  • ESPP program
  • Flexible vacation policy
  • Maternity & paternity leave
  • MacBook Pro to use for work, plus a generous stipend to personalize your workstation
  • Childcare bonus (human children only)
  • Fertility treatment and support
  • Learning & development program
  • Commuter benefits
  • Culture that supports a healthy work-life balance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service