Site Reliability Engineer - Data Center

AmdocsPlano, TX
8dHybrid

About The Position

We are seeking an experienced Site Reliability Engineer (SRE) to join our Data Center Engineering team at Level 3. This role requires a technically strong and operationally mature engineer who will help design, scale, and maintain the reliability of our physical and virtual data center infrastructure. As a Level 3 SRE, you will be a technical leader responsible for ensuring system uptime, optimizing capacity and performance, and contributing to long-term infrastructure resiliency.

Requirements

  • Bachelor’s degree in Computer Engineering, Electrical Engineering, Information Technology, or a related technical field.
  • 4-7 years of experience in database administration and operations.
  • Experience participating in or leading incident response and postmortem analysis processes.
  • Experienced PostgreSQL Database Administrator managing production and non-production PostgreSQL environments.
  • Skilled in backup and recovery, replication, performance tuning, and high availability.
  • Proven ability to troubleshoot critical issues, automate DBA tasks, and ensure database reliability.
  • 4+ years of hands-on PostgreSQL administration experience.
  • Strong SQL and PL/pgSQL expertise; experience with database optimization and indexing.
  • Hands-on experience with backup, recovery, and HA solutions.
  • Strong proficiency in Linux and Debian environments.
  • Proficiency in scripting for database automation.
  • Excellent analytical, problem-solving, and troubleshooting skills.
  • Strong communication skills for cross-team collaboration.

Nice To Haves

  • Previous exposure to hybrid environments integrating on-premise data centers with public or private cloud platforms is desirable.
  • Understanding of Oracle and MySQL databases is a plus, but not mandatory.

Responsibilities

  • Design, implement, and maintain PostgreSQL databases, including schema design, indexing strategies, query optimization, logical/physical replication, hot standby failover, and load balancing.
  • Develop and execute backup and recovery strategies, including pg_dump, pg_basebackup, WAL archiving, point-in-time recovery (PITR), and disaster recovery planning.
  • Monitor and optimize database performance, resource utilization, and storage growth using pg_stat_statements, EXPLAIN ANALYZE, pg_top, and Prometheus/Grafana dashboards; proactively troubleshoot performance bottlenecks.
  • Ensure database security through role-based access control (RBAC), audit logging with pgaudit, and compliance with regulatory standards.
  • Implement high availability (HA) and disaster recovery (DR) solutions using Patroni, streaming replication, synchronous/asynchronous replication, and failover orchestration.
  • Plan and execute database version upgrades and apply security or performance patches with minimal downtime, ensuring data integrity and compatibility checks.
  • Collaborate with application teams, BI developers, and ETL engineers to support data pipelines, optimizing queries, and workflow performance.
  • Implement monitoring and alerting solutions using Prometheus, Grafana, Zabbix, or Nagios to track database health, query latency, and resource usage.
  • Manage database user accounts, roles, and privileges to enforce security policies and regulatory compliance, including sudo/OS-level permissions for critical operations.
  • Conduct capacity planning, workload forecasting, and index/partition tuning to handle anticipated growth and high-concurrency workloads.
  • Automate database maintenance tasks using Python, Bash, or Ansible scripts, including schema migrations, routine checks, and patch deployment.
  • Document procedures, configurations, operational runbooks, and PostgreSQL best practices for team knowledge sharing.
  • Mentor and guide team members on PostgreSQL internals, replication setups, and performance tuning techniques.
  • Evaluate and recommend new database tools, extensions (like TimescaleDB, pg_stat_statements), and best practices to improve efficiency, scalability, and resilience.

Benefits

  • You will be a key member of a global, dynamic and highly collaborative team with various possibilities for personal and professional development
  • Join us in our expanding organization, with ever growing opportunities for personal growth and one of the highest scores of employee engagement in Amdocs!
  • We provide stellar benefits that range from health insurance to paid time off, sick leave, and parental leave!
  • Amdocs is an equal opportunity employer. We welcome applicants from all backgrounds and are committed to fostering a diverse and inclusive workforce
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service