Sr. Site Reliability Administrator

Open Text CorporationRichmond Hill, ON

About The Position

As a Senior Site Reliability Engineer, you will play a critical hands-on role within a globally distributed SRE team focused on ensuring the reliability, performance, and stability of the data services that power our customer-facing SaaS platforms. You’ll work across key components of our distributed systems ecosystem, including Kafka, Elasticsearch, Cassandra, Solr, Redis, and OpenSearch supporting both on-premises infrastructure and public cloud environments (AWS, Azure, GCP). This role is perfect for engineers who thrive on solving complex operational challenges, driving automation, and improving system reliability at scale. You’ll collaborate closely with cross-functional teams, strengthen your expertise in distributed systems, and contribute to building resilient, high-quality services that power critical applications for customers around the world.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field - or equivalent practical experience.
  • 4+ years of experience in Information Technology supporting large‑scale enterprise systems.
  • 2+ years operating or supporting distributed data platforms (e.g., Kafka, Elasticsearch, Cassandra, Solr, Redis, OpenSearch).
  • 2+ years working with automation and configuration tools such as Terraform and Ansible.
  • Strong knowledge of Linux systems administration.
  • Experience working with public cloud infrastructure (AWS, Azure, or GCP).
  • Solid troubleshooting skills and ability to resolve complex technical problems.
  • Self‑driven, detail‑oriented, and able to manage multiple tasks in a fast‑moving environment.
  • Familiarity with ITIL processes; certification is a plus.
  • Experience with observability tools (Prometheus, Zabbix, Grafana, New Relic, etc.) is a plus.

Nice To Haves

  • Familiarity with ITIL processes; certification is a plus.
  • Experience with observability tools (Prometheus, Zabbix, Grafana, New Relic, etc.) is a plus.

Responsibilities

  • Operate, maintain, and scale distributed data platforms including Kafka, Elasticsearch, Cassandra, Solr, Redis, and OpenSearch across on-premises and public cloud environments (AWS, Azure, GCP).
  • Build, enhance, and support infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform and Ansible.
  • Perform patching, upgrades, and routine maintenance to ensure systems remain secure, stable, and compliant with internal standards.
  • Collaborate with SRE and engineering teams to design, deploy, and monitor data platforms and supporting infrastructure.
  • Participate in incident response, troubleshooting, and root cause analysis, while contributing to a 24x7 on-call rotation for critical services.
  • Support capacity planning, performance tuning, and system health assessments to ensure optimal reliability and scalability.
  • Develop and maintain technical documentation, including operational procedures, change plans, and incident reports.
  • Contribute to automation and reliability improvements, support service requests, help meet SLA/OLA commitments, and actively participate in team knowledge sharing and training initiatives.

Benefits

  • At OpenText, we offer a thoughtfully designed benefits package that supports your physical, emotional, and financial wellbeing.
  • As you move through the hiring process, we’re happy to provide more details about our compensation programs, including variable and commission compensation opportunities for eligible roles, vacation entitlement, and paid time off.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service