About The Position

CloudLinux / TuxCare is a remote-first infrastructure and security company with over 300 engineers building and operating products for hosting providers, enterprises, and internal service teams globally. The Infrastructure Department manages the platforms for CloudLinux OS, Imunify, KernelCare, TuxCare ELS, and their engineering systems. We are seeking a Senior Database Reliability Engineer to join the Infrastructure DBA cell. This is a hands-on role focused on production ownership, not just ticket processing. The engineer will ensure the reliability of critical database services, automate repetitive tasks, support engineering teams, and reduce single-person dependencies in our PostgreSQL, ClickHouse, MongoDB, and Redis operations. While PostgreSQL is the primary requirement, experience with ClickHouse is highly valued and considered a strong plus, though not a mandatory day-one skill. The ideal candidate will possess sufficient depth in databases, Linux, automation, and incident response to quickly learn and safely operate our ClickHouse environment.

Requirements

  • Deep hands-on PostgreSQL experience in business-critical production environments (typically 5+ years or equivalent depth).
  • Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
  • Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
  • Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
  • Automation skills with Ansible and scripting.
  • Ability to support more than one database engine; readiness to learn ClickHouse quickly and take responsibility for it.
  • Practical use of AI engineering assistants (e.g., Claude, Codex) for improving speed and quality, with personal verification of generated SQL, commands, scripts, and operational conclusions.
  • Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.

Nice To Haves

  • ClickHouse operations: replication, Keeper/ZooKeeper, MergeTree engines, distributed DDL, grants, row policies, backups, query troubleshooting, and cluster recovery.
  • MongoDB replica sets and Percona Backup for MongoDB.
  • Redis/Sentinel and broker/cache failure modes.
  • Database observability, SLOs, golden signals, alert tuning, and executable incident runbooks.
  • Building internal platforms, self-service portals, or DBaaS workflows for engineering teams.
  • Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery.

Responsibilities

  • Own production PostgreSQL reliability, including HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity planning, backups, PITR, and restore validation.
  • Improve disaster recovery and operational evidence through tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans.
  • Support the wider database estate, including ClickHouse, MongoDB, and Redis, by troubleshooting incidents, reviewing access and data-safety changes, improving monitoring, and learning production ClickHouse patterns.
  • Automate DBA workflows using Ansible, Terraform/OpenTofu, GitLab CI/CD, and scripts for provisioning, grants, backups, restores, health checks, and ownership metadata.
  • Help build DBaaS-style self-service capabilities to enable engineering teams to request databases, access, credentials, and operational checks with reduced manual DBA intervention.
  • Enhance observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.

Benefits

  • Focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours.
  • Paid 24 days of vacation per year.
  • 10 days of national holidays.
  • Unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • Opportunity to receive a reward for the most innovative idea that the company can patent.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service