Cluster & Systems Capacity Engineer

Backblaze External Website

6h•$123,000 - $175,000•Remote

About The Position

Backblaze is seeking a highly analytical, systems-oriented Cluster & Systems Capacity Engineer to drive the planning, forecasting, deployment, and optimization of hardware infrastructure across our global cloud storage platform. This role ensures that Backblaze’s storage clusters, compute systems, and network infrastructure scale reliably, cost-efficiently, and ahead of demand. You will build and maintain predictive models, ensure consistent supply and demand alignment, and partner cross-functionally to inform strategic investment and deployment decisions. This is a high-impact role within Cloud Operations, directly contributing to service availability, durability, performance, margin optimization, and long-term platform scalability.

Requirements

Bachelor’s degree in Computer Science, Engineering, Mathematics, Data Science, Information Systems, Statistics or a related, technical field (or equivalent experience).
3-6+ years of experience in Site Reliability Engineering, Infrastructure Capacity Planning, Systems/Infrastructure Engineering, Production Engineering, Data Center Operations or similar Cloud Operations role
Familiarity and experience working with Cloud Storage infrastructure, particularly highly-available, large-scale distributed systems supporting large amounts of data with high throughput and complex performance requirements
Background in capacity modeling, performance analysis, scenario modeling, and/or infrastructure cost optimization, with an ability to quantify ideas within financial frameworks and forecasts.
Proficiency in database and data analysis tools (preferably Snowflake, Metabase, Grafana, Python, SQL, Prometheus, Victoria Metrics, and Excel/Google Sheets)
Demonstrated deep, creative, and logical thinking complimented by a strong data analysis skillset
Excellent communication and documentation skills, with the ability to share knowledge and explain concepts accurately and concisely
Desire to work on a highly-autonomous team that cares deeply about quality, cost, and the customer experience

Responsibilities

Develop and maintain short, medium, and long-term capacity demand and hardware deployment forecasts across storage, compute, and network domains within the platform
Build predictive models that translate business demand signals into infrastructure requirements using historical utilization, growth trends, product sales plans, hardware lifecycle roadmaps, and other key business inputs
Partner with Infrastructure, Production, and Network Engineering teams to align capacity plans with system design and scaling initiatives
Develop and automate forecasting pipelines, simulation calculators and tools, and capacity dashboards to improve data quality, reduce manual analysis, and provide stakeholders clear visibility into platform usage and cluster health metrics
Monitor and analyze cluster and system-level utilization and performance across CPU, memory, IOPS, and network resources
Adjust deployment plans and recommended configurations in real-time to maintain adequate headroom and system stability in support of delivering a world-class customer experience
Partner with service and platform owners to develop headroom and live buffer policies, optimize hardware BoMs, leverage virtualized orchestration, and reduce product cost
Work in lockstep with Operations and Finance peers to align capacity plans and hardware requirements with capital budgets, cost targets, and financial outcomes
Support strategic optimization initiatives across infrastructure investments, engineering development, and operations processes, contributing to long-term infrastructure strategy and capital planning
Lead efforts to evaluate, procure, and provision requests for new or additional hardware, working with Systems and Network Engineering, SRE, NOC, and Data Center Operations teams to identify and deliver optimal solutions
Maintain alignment with Product and Sales to support customer onboarding, growth, and demand variability
Communicate complex capacity and infrastructure insights clearly to technical and non-technical stakeholders

Benefits

Healthcare for family, including dental and vision
Competitive compensation and 401K
RSU grants for full-time employees
ESPP program
Flexible vacation policy
Maternity & paternity leave
MacBook Pro to use for work, plus a generous stipend to personalize your workstation
Childcare bonus (human children only)
Fertility treatment and support
Learning & development program
Commuter benefits
Culture that supports a healthy work-life balance