Database Site Reliability Engineer

TherapyNotes.com
1d$120,000 - $160,000Remote

About The Position

We are seeking a Database Site Reliability Engineer who demonstrates a strong skill set in managing PostgreSQL. In this role, you will own the reliability and operability of our PostgreSQL services supporting a growing 24x7 SaaS platform, with an emphasis on availability, performance, observability, incident response, and automation. You will partner with cross-functional teams—including developers, operations, and infrastructure—to ensure that our database services run smoothly and efficiently. If you are passionate about operational excellence and continuous improvement, we want to hear from you.

Requirements

  • BS degree in Information Systems, Engineering, or equivalent experience
  • 10+ years of Engineering experience with Database Engineering, Systems Engineering, DevOps and/or SRE
  • Experience in cloud-based compute, storage, and containerization solutions (Azure & Kubernetes preferred)
  • Expertise with an observability/monitoring platform (e.g., Prometheus/Grafana, New Relic, Datadog, or equivalent); Datadog experience is a plus.
  • Experience working in Agile/DevOps environments and operating production services with ITSM practices where applicable

Nice To Haves

  • Proficiency with operating PostgreSQL in a Linux environment is a plus
  • Experience with writing & designing ETL pipelines using Python a plus
  • Understand and maintain various PostgreSQL ecosystem components like: PgBouncer, PgBackrest, HaProxy, RepMgr a plus
  • Excellent communication and interpersonal skills.
  • Some exposure to Terraform a plus.
  • Familiarity with PGAnalyze or Percona a plus.

Responsibilities

  • Responsible to design, implement, and maintain high-availability, high throughput, data and compute intensive, critical database systems running PostgreSQL which supports a growing 24x7 SaaS platform.
  • Define and improve database service reliability through monitoring/alerting, SLO-oriented metrics, and operational readiness.
  • Participate in and help drive incident response, root cause analysis, and post-incident corrective actions for database-related production events.
  • Partner with other technical leaders to ensure all newly introduced systems are supportable and maintainable by both development and operations.
  • Provides escalated technical guidance and support to other technology teams throughout the organization
  • Provides on-call coverage for production support and other duties as required.
  • Accountable for complying with HIPAA security policies within the database platform
  • Ensure all solutions and operational activities adhere to the security and operating policies established by the organization
  • Own and continuously improve our Datadog database observability by building actionable dashboards, alerts, and service-level views using an observability stack (e.g., Prometheus, Grafana, New Relic, or equivalent). Familiarity with PGAnalyze or Percona a plus.
  • Automate system maintenance tasks using Bash, Powershell, Python, or Ansible. Manage infrastructure as code (IaC) writing Ansible playbooks. Some exposure to Terraform a plus.

Benefits

  • Competitive salary - $120,000-160,000
  • Employer sponsored health, dental, vision, life, and disability insurance
  • Retirement plan with company contribution
  • Annual company profit sharing
  • Personal development/training budget
  • Open, collaborative work environment
  • Extensive 2-week onboarding plan
  • Comprehensive mentorship program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service