Senior Database Reliability Engineer (DBRE) & Architect

Alex Staff Agency

4h•Remote

About The Position

This position is open at a global product-led IT company specializing in infrastructure stability and security solutions. Their products are recognized as the industry standard in the Hosting and Enterprise segments, powering over 500,000 servers worldwide. In 2025, the company is evolving its data management strategy, shifting from traditional database administration to an Internal Database-as-a-Service (DBaaS) model. This role requires a visionary engineer to design resilient distributed systems, automate infrastructure through code, and transform databases into a reliable service for product teams. This is an ideal opportunity for those ready to handle petabytes of data and build high-scale platform solutions.

Requirements

5+ years of PostgreSQL expertise: deep knowledge of MVCC, locking mechanics, expert-level Patroni/PgBouncer configuration, and experience with seamless major version upgrades under load.
ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and performance diagnostics at the data-part level.
Engineering mindset (SRE/DevOps): experience writing complex Terraform modules and Ansible roles; proficiency in Python or Go for automation is a major asset.
Hybrid environment experience: understanding the nuances of running DBs on Bare Metal vs. Kubernetes vs. Public Cloud, with the ability to optimize TCO and disk subsystem performance (NVMe, Network Storage).
Systems approach: understanding the full stack from network packets to business logic, including security standards (FIPS, Audit logs) and Disaster Recovery.

Nice To Haves

Experience building an Internal Developer Platform (IDP).
Experience operating databases in Kubernetes via operators (CloudNativePG, Altinity Operator).
Background working with Cloud or Hosting providers on similar services.

Responsibilities

Designing and implementing a self-service platform (Terraform + Ansible) for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal, OpenNebula, K8s, Public Clouds).
Managing rapidly growing analytics clusters (12+ clusters, tens of terabytes), focusing on sharding, ReplicatedMergeTree, and building reliable S3 backup pipelines under high load.
Maintaining and scaling infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
Implementing SRE practices in data management: replacing manual incident response with automated self-healing mechanisms and defining SLO/SLIs.
Migrating legacy solutions to modern cloud patterns and implementing Kubernetes operators for stateful workloads.
Serving as a technical authority for product teams to optimize data schemas and SQL queries for high-load systems.

Benefits

Fully remote work from any location worldwide and flexible working hours.
Opportunity to impact architectural decisions for services used by thousands of companies globally.
24 days of vacation, 10 national holidays, and unlimited paid sick leave.
Compensation for private medical insurance.
Reimbursement for co-working spaces and gym/sports activities.
Dedicated budget for education, training, and conferences.
Reward program for innovative ideas that lead to company patents.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume