Database Engineer

Wavelo

3d•Remote

About The Position

We are looking for a highly skilled Database Reliability Engineer (DBRE) with deep expertise in PostgreSQL at scale. In this role, you will design, operationalize, and optimize the data persistence layer that powers large-scale, mission-critical systems. You’ll work closely with SRE, Platform, and Engineering teams to ensure performance, reliability, automation, and operational excellence across the database environment. This is a hands-on engineering role focused on building resilient data infrastructure—well beyond traditional database administration. This role is a remote position open to applicants based in Canada.

Requirements

Deep understanding of PostgreSQL internals: MVCC, WAL processing, vacuum behavior, locking, query planning
Experience designing and operating highly available database clusters with automated failover
Strong performance tuning skills (query optimization, indexing, workload tuning)
Ability to diagnose database and system issues: Query plans, I/O, memory usage, WAL growth, table/index bloat
Experience with backup and recovery strategies: Point-in-time recovery (PITR), durability planning
Familiarity with observability and monitoring: Metrics, alerting, and performance dashboards (Grafana)
Understanding of distributed systems concepts: Service discovery, consensus (e.g., Consul)
Strong Linux systems knowledge (performance tuning, resource management)
Experience with scripting and infrastructure-as-code automation
Strong troubleshooting and problem-solving skills in production environments
Knowledge of: Security, compliance, encryption, auditing, access control
Ability to work independently in high-availability, production-critical systems
Familiarity with AI-assisted tools (e.g., Claude, Windsurf, GitHub Copilot)
7+ years of hands-on PostgreSQL experience in large-scale, high-volume production environments
Strong expertise in PostgreSQL internals: WAL, MVCC, vacuum tuning, query planner, indexing, logical replication
Advanced SQL and strong schema design and query optimization skills
Solid experience with Linux systems and networking fundamentals
Experience building automation using Go or Python
Experience with monitoring tools such as: Prometheus, Grafana, Datadog, PMM, pg_stat_statements

Nice To Haves

Experience with connection pooling and load balancing: PgBouncer, HAProxy
Experience with high-availability solutions: Patroni or similar tools
Exposure to event streaming and CDC: Kafka, Debezium
Experience supporting 24/7 production environments
Experience with PostgreSQL backup tools: Barman, pgBackRest, WAL-G
Familiarity with Traefik or similar infrastructure components

Responsibilities

Design, implement, and operate highly available PostgreSQL clusters (physical/logical replication, sharding, partitioning, failover automation)
Optimize query performance and indexing strategies
Perform capacity planning, growth forecasting, and workload modeling
Own high-availability strategies, including: Automatic failover Multi-region deployments Disaster recovery
Build and maintain automation for: Provisioning and configuration Backups and recovery Failovers Vacuum tuning Schema management
Use tools such as Terraform, Ansible/SaltStack, Bash, Python, etc.
Develop monitoring and alerting systems for PostgreSQL clusters
Lead response during database incidents (e.g., performance regressions, replication lag, deadlocks, bloat, storage failures)
Conduct root-cause analysis and implement long-term fixes
Partner with software engineers to: Review SQL queries Optimize schemas Ensure effective use of PostgreSQL features
Provide guidance on: Database design patterns Migrations and version upgrades Best practices