Staff Database Engineer

TensorWaveLas Vegas, NV

About The Position

We’re looking for a Staff Database Engineer to join our team during an exciting phase of growth. In this role, you’ll be responsible for database architecture, database reliability, infrastructure-adjacent database platforms, performance engineering, observability, automation, operational maturity, incident response, and engineering leadership, working closely with cross-functional partners to support business objectives while upholding our standards for excellence, collaboration, and impact.

Requirements

  • 8+ years of production database engineering, database administration, or database architecture experience
  • Strong hands-on experience with PostgreSQL in production environments
  • Strong hands-on experience with MySQL, Percona, or equivalent relational database platforms
  • Experience designing and operating highly available database systems
  • Experience with replication, failover, backup, restore, and disaster recovery validation
  • Deep SQL performance tuning experience, including execution plan analysis, index design, query rewrite, schema optimization, lock contention troubleshooting, storage and I/O analysis
  • Strong Linux systems knowledge
  • Experience supporting production incidents and performing root cause analysis
  • Experience building or improving database monitoring and observability
  • Ability to work across infrastructure, DevOps, platform, and application engineering teams
  • Ability to define standards, influence architecture, and mentor other engineers without requiring direct management authority

Nice To Haves

  • Experience with SlurmDBD, Slurm accounting databases, or HPC/AI infrastructure database workloads
  • Experience with NetBox or other infrastructure source-of-truth platforms
  • Experience with Percona XtraDB Cluster, ProxySQL, or advanced MySQL/Percona architectures
  • Experience with PostgreSQL HA tooling and replication architectures
  • Experience with Prometheus, Grafana, PMM, Splunk, or similar observability platforms
  • Experience with eBPF/BCC, perf, strace, or other low-level Linux diagnostic tooling
  • Experience supporting databases for SaaS, cloud, HPC, AI infrastructure, or large multi-tenant platforms
  • Experience with MongoDB, Oracle, SQL Server, or other secondary database platforms
  • Experience with database automation using Ansible, Terraform, CI/CD systems, or internal tooling
  • Experience with zero-downtime migrations, major version upgrades, and production database consolidation

Responsibilities

  • Design and own database architecture for critical infrastructure and platform services, including PostgreSQL-backed internal platforms, Slurm accounting and operational databases, NetBox and infrastructure source-of-truth databases, custom internal applications and automation services, observability, inventory, and platform metadata systems, future database-backed control plane services.
  • Define standard database patterns for high availability, replication, failover, backup and restore, point-in-time recovery, performance baselining, capacity planning, upgrade lifecycle management, access control and operational security.
  • Establish database design standards for new internal platforms, including schema review, indexing strategy, query design, service ownership boundaries, and production readiness requirements.
  • Operate and improve production database environments across PostgreSQL, MySQL, Percona, and adjacent systems.
  • Own the lifecycle of database systems, including provisioning, configuration, version upgrades, replication topology design, performance tuning, backup validation, disaster recovery testing, decommissioning, documentation and runbook creation
  • Troubleshoot and resolve production database issues involving query latency, lock contention, replication lag, storage I/O bottlenecks, connection exhaustion, poor indexing, schema design problems, database capacity constraints, backup or restore failures
  • Drive root cause analysis for database-related incidents and convert findings into durable engineering improvements.
  • Serve as the senior database engineering owner for infrastructure-adjacent database platforms, including Slurm and NetBox.
  • For Slurm environments, support and improve database architecture related to SlurmDBD, accounting data, job history, reporting queries, performance and retention strategy, database scaling, backup and recovery, long-term operational reliability
  • For NetBox and source-of-truth systems, support PostgreSQL performance, database lifecycle planning, backup and restore validation, data integrity, schema-impact review, integration patterns with automation systems
  • Partner with DevOps, Infrastructure, MLOps, and Platform Engineering teams to ensure database-backed systems are designed to scale as the environment grows.
  • Build deep database observability beyond basic dashboards.
  • Develop and maintain visibility into query performance, execution plans, index usage, replication health, locking behavior, buffer/cache efficiency, storage latency, connection pool behavior, OS-level database bottlenecks
  • Use tools such as PostgreSQL native statistics, MySQL/Percona tooling, Prometheus, Grafana, PMM, Query logs, slow query logs, eBPF/BCC or equivalent low-level profiling tools, Linux performance tooling
  • Create performance baselines and alerting standards for critical database platforms.
  • Identify recurring database failure patterns and build preventive monitoring, automation, and operational guardrails.
  • Create database automation patterns that can be integrated with existing infrastructure tooling.
  • Partner with DevOps and Infrastructure Engineering to automate database provisioning, configuration standards, backup verification, health checks, replication checks, user and permission management, upgrade workflows, monitoring deployment, runbook-driven recovery procedures
  • Contribute database-specific modules, roles, or workflows to Ansible, CI/CD pipelines, or internal automation platforms where appropriate.
  • Define production database readiness standards for new services before they are promoted into critical environments.
  • Act as the technical lead for major database incidents.
  • Own or support triage, root cause analysis, cross-team coordination, customer or stakeholder impact analysis, postmortems, corrective action plans, long-term remediation
  • Mentor L4 and L5 engineers on database operations, SQL troubleshooting, HA design, incident response, and performance analysis.
  • Provide senior technical review for database-impacting changes across infrastructure and platform teams.

Benefits

  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance for Employees
  • Company Health Savings Account Contributions
  • 100% paid Short Term and Long Term Disability Insurance for Employees
  • Life and Voluntary Supplemental Insurance Options
  • Other Insurance Options, such as Pet & Legal Insurance
  • Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
  • Flexible Spending Account
  • 401(k)
  • Employee Assistance Program
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Other In-Office Perks
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service