Senior Database Reliability Engineer

Rithum LinkedIn BoardSeattle, WA
1d$90,000 - $140,000Remote

About The Position

Rithum™ is the world’s most trusted commerce network, accelerating how brands, suppliers, and retailers work together to deliver seamless e-commerce experiences. We provide an unmatched platform for brands and retailers, enabling them to accelerate growth, optimize operations across channels, scale product offerings and enhance margins. Today, more than 40,000 companies trust Rithum to grow their business across hundreds of channels, representing over $50 billion in annual GMV. Using our commerce, marketing, and delivery solutions, our customers create optimized consumer shopping journeys from beginning to end. Overview At Rithum the Database Reliability Engineering (DBRE) team is responsible for the availability, reliability, and observability of all database systems. We rely heavily on automation to reduce manual work and are always looking for ways to improve our processes. We currently Manage and optimize a large-scale SQL Server environment spanning hundreds of instances across hybrid infrastructure (onprem VMware and AWS), as well as installations of various relational and NoSQL database platforms including MongoDB, DynamoDB, Elasticsearch, MySQL, Postgres, and Redis. These database systems touch all aspects of the business. On the DBRE team, we have a strong culture of curiosity, honesty, and a love of collaboration and constant learning. As a Senior Database Reliability Engineer, you are expected to embody these values and help foster them among your fellow team members. You are responsible for managing a variety of database systems and designing and leading your own projects with a highly technical approach.

Requirements

  • 3+ years of hands-on experience managing database systems
  • Basic understanding of multi-tenant, database-driven applications
  • Familiarity with common data storage technologies and use cases
  • Proficiency in a common high-level language such as PowerShell or Python
  • Design, implement, and support SQL Server Always On Availability Groups, clustering, and transactional replication
  • Lead and execute major upgrade initiatives (e.g., SQL Server version upgrades, compatibilitylevel transitions)
  • Own disaster recovery validation and failover strategy execution
  • Investigate and resolve complex replication and distribution database failures

Nice To Haves

  • 5+ years of hands-on experience administering production database systems at scale
  • Bachelor’s degree in a related field
  • Experience with database high-availability technologies
  • Experience with MongoDB (including replica sets and Atlas), DynamoDB, or other distributed NoSQL platforms
  • Familiarity with cloud-native database architectures (AWS preferred)
  • Exposure to data platform modernization initiatives (e.g., migration to newer engine versions, consolidation, cloud adoption)
  • Knowledge of relational database design concepts in an OLTP environment
  • Experience managing database systems in a cloud environment
  • Strong problem-solving and analytical skills
  • Excellent communication and teamwork abilities
  • Commitment to continuous learning and professional development

Responsibilities

  • Ensure maximum availability and reliability of mission-critical database systems across hybrid infrastructure.
  • Design, implement, and maintain SQL Server Always-on Availability Groups, clustering, and replication topologies. Constantly improve the observability of all database systems.
  • Lead major database upgrade initiatives and modernization efforts. Support other engineers and teams in their use of database systems.
  • Continuously enhance observability using telemetry, performance analysis, and proactive monitoring.
  • Continuously enhance processes through automation. Automate operational workflows using PowerShell, Python, and CI/CD tooling.
  • Ensure all data is protected and secure
  • Participate in our on-call rotation
  • Troubleshoot and tune high-load production systems, including complex performance and replication issues.
  • Lead technical response during high-severity incidents and conduct root cause analysis.
  • Ensure database security, backup integrity, and disaster recovery readiness.
  • Contribute to the development of best practices for database engineering and reliability.
  • Collaborate cross-functionally to design scalable, resilient data architectures.
  • Mentor team members and contribute to engineering best practices.

Benefits

  • Medical, dental and vision benefits: Affordable health care plans and company HSA contributions, starting on Day 1
  • A 6% 401(k) match
  • Competitive time off package with 20 days of Paid Time Off, 9 Company-Paid holidays, 2 paid floating holidays, 7 paid sick days, 2 Wellness days, and 1 Paid Volunteer Day; at 3 years of service PTO increases to 22 days, and at 5 years it increases to 25 days
  • 12 weeks primary caregiver leave & 4 weeks secondary caregiver leave
  • Accident, critical illness, and hospital indemnity insurance
  • Pet insurance
  • Legal assistance and identity theft insurance plans
  • Life insurance 2x salary
  • Access to the Calm app and the Employee Assistance Program
  • $65/month Remote work stipend for internet
  • Culture and team-building activities
  • Tuition assistance
  • Career development opportunities
  • Charitable contribution match up to $250 per year
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service