Senior Database Reliability Engineer

Scribe

1d•$145,000 - $230,000•Hybrid

About The Position

Scribe is seeking a Senior Database Reliability Engineer to take ownership of the reliability, performance, and scalability of the company's data tier. As the engineering organization doubles, the guardrails, automation, and standards established by this role will be crucial for future growth. This is a senior individual contributor position offering significant ownership, where the engineer will define how the company interacts with its databases, going beyond basic maintenance. The current technology stack includes Django on PostgreSQL (Aurora Serverless V2), OpenSearch, Redis (ElastiCache), SQS, and RabbitMQ, with a Change Data Capture (CDC) pipeline that moves data from Aurora to DMS, then to S3 Parquet, and finally to Snowflake. Engineers utilize an ORM for database interactions, making migration safety, index design, and query review critical aspects of the role.

Requirements

Deep practical PostgreSQL expertise, including fluency in reading EXPLAIN (ANALYZE, BUFFERS), understanding MVCC, bloat, lock contention, and vacuum behavior, and the ability to tune Aurora Serverless V2 for latency and throughput.
Experience with an ORM (Django, SQLAlchemy, ActiveRecord, or similar) at production scale, with the ability to predict generated SQL, identify N+1 issues, and understand the trade-offs between joins and batched IN queries.
Production experience running CDC pipelines, ideally with AWS DMS, including familiarity with logical replication, slot hygiene, schema evolution, and Parquet-based data lakes feeding Snowflake, BigQuery, or Redshift.
Hands-on experience with pganalyze (or Datadog DBM / pg_stat_statements pipelines), CloudWatch, and Honeycomb (or another high-cardinality tracing tool), with comfort using OpenTelemetry.
Experience with OpenSearch, Redis, and at least one production message broker (SQS, RabbitMQ, or Kafka) at scale.
Proficiency in writing automation scripts using Python, Go, or similar languages, and experience managing infrastructure with Terraform or comparable Infrastructure as Code (IaC) tools.
Experience using AI coding and review tools in a team setting, including writing or maintaining AGENTS.md files, configuring review agents, and iterating on prompts.

Nice To Haves

Event sourcing on Postgres, or experience with alternate CDC tooling like Debezium, Fivetran, or Airbyte.
Experience with pgbouncer or RDS Proxy at scale with Django connection handling.
Deep usage of Honeycomb, including SLOs, BubbleUp, Triggers, and derived columns.
Experience with Snowflake from the producer side, including staging, Snowpipe, and external tables on Parquet.
Experience scaling data infrastructure through rapid engineering headcount growth.
Familiarity with SOC 2 Type II, GDPR, or similar compliance work.

Responsibilities

Own database reliability across Aurora, OpenSearch, Redis, and the CDC pipeline, including schema design reviews, migration safety (locks, backfills, concurrent index builds, NOT VALID constraints), and incident response for the data tier.
Enhance the Django ORM's scalability by identifying N+1 patterns during reviews, extending QuerySet conventions and physical schema standards, and developing CI checks and AGENTS.md scaffolding to enforce these standards.
Operate and evolve the CDC pipeline from Aurora through DMS to S3 Parquet to Snowflake, managing replication slot hygiene, schema evolution safety, and implementing automated checks to prevent migrations that could break downstream consumers.
Improve observability using pganalyze, CloudWatch, and Honeycomb, including Django-side instrumentation to link slow ORM queries to specific users, flags, and deploys.
Drive multi-AZ resilience within the single-region architecture, focusing on Aurora writer/reader placement, failover behavior, RTO/RPO, and the AZ topology of ElastiCache and OpenSearch, as well as RabbitMQ survivability.
Build self-service tooling and dashboards to provide product and platform teams with visibility into their query footprints, thereby reducing the review burden as the engineering team expands.
Contribute to onboarding and knowledge-sharing for new engineers by writing documentation, conducting internal sessions on ORM query behavior, and integrating this knowledge into AI review tooling.