Cloud Site Reliability Engineer

Imagine Communications•Toronto, ON

22h

About The Position

Every day, Imagine Communications is delivering billions of media moments all over the world — anywhere, anytime and on any device. Imagine Communications delivers innovative, end-to-end media software and networking solutions to over 3,000 customers in more than 185 countries, including the top broadcast facilities and the most technologically advanced sports and live-event venues. A Bit About The Role The Site Reliability Engineer (SRE) will apply deep expertise in DevOps practices, automation, infrastructure orchestration, configuration management, and continuous integration to support the delivery and operation of mission‑critical applications. This role will focus primarily on the development, deployment, and reliability of the xGPlatform and its associated peripheral services. The SRE will play a key role in advancing Imagine Communications toward a robust, multitenant, multi‑cloud product strategy. The ideal candidate brings a strong background and passion for software development, DevOps, and cloud technologies. This individual will design and build scalable systems using a diverse technology stack that includes AWS, Azure, Node.js, C#, and modern deployment automation and tooling. In addition to building reliable services, the SRE will empower engineering teams to work more efficiently and effectively. Success in this role requires a highly motivated, collaborative communicator who thrives in a fast‑paced environment and contributes to Agile, cloud‑native development practices.

Requirements

Bachelor’s degree in Computer Science, Engineering or a related technical field.
6+ years of professional experience in Site Reliability Engineering, DevOps, Cloud Engineering, or Software Development roles supporting production systems.
Strong understanding of cloud architecture principles, including scalability, resiliency, high availability, security, and cost optimization.
Hands‑on experience designing, deploying, and operating applications and infrastructure in AWS and/or Azure.
Proficiency with infrastructure‑as‑code and cloud‑native technologies (e.g., Terraform, Ansible, Docker, Kubernetes, Prometheus, messaging or caching systems).
Extensive experience with monitoring, logging, and observability tools and practices.
Proven ability to troubleshoot and resolve complex production issues, including ownership of Tier‑3 incidents and root cause analysis.
Experience integrating systems using Web APIs, messaging, or event‑driven architectures.
Working knowledge of SQL and NoSQL databases, including schema design, querying, and operational considerations.
Experience working in Agile and DevOps environments.
Strong communication and collaboration skills, with the ability to work effectively across engineering, architecture, and business teams.

Nice To Haves

Experience operating and supporting mission‑critical, customer‑facing, or managed service platforms.
Experience leading or contributing to incident response, post‑incident reviews, and reliability improvements.
Familiarity with SRE practices such as service health indicators and reliability objectives.
Experience identifying and reducing operational toil through automation and process improvement.
Experience contributing to platform architecture decisions or reusable cloud deployment patterns.
Hands‑on experience with infrastructure and delivery tools such as Terraform, Ansible, or Azure DevOps.
Experience with scripting/programming languages such as Go, Node.js, PowerShell, Python, or Shell scripting is a strong plus.
Exposure to cost management, capacity planning, and performance optimization in cloud environments.
Familiarity with cloud security and compliance standards such as SOC 2.
Relevant industry certifications (or progress toward certification), such as AWS Certified Solutions Architect or DevOps Engineer.
Flexibility to adjust working day to accommodate co-workers and customers operating in different geographical regions.

Responsibilities

Design, build, deploy, and operate applications and infrastructure across AWS, Azure, and other cloud service providers as required.
Manage and maintain development, staging, and production environments using infrastructure‑as‑code and automation best practices.
Design and implement systems and tooling that improve the reliability, scalability, security, and supportability of Imagine’s Managed Services offerings.
Promote DevOps and cloud best practices within the team to improve quality, reduce operational risk, increase security, drive efficiency and reuse, and optimize costs.
Collaborate with product, architecture, and business stakeholders to understand user needs and translate them into reliable, scalable technical solutions.
Integrate and orchestrate diverse cloud services and internal systems using Web APIs and event‑driven architectures.
Architect, document, and review system designs with a strong focus on security, resiliency, and operational excellence.
Build and integrate cloud‑based services and automation to improve workforce productivity and reduce manual operational effort.
Partner with architecture and development teams to design reusable deployment patterns and establish governance and observability models.
Apply cloud compliance, security, and reliability standards to application and platform design.
Lead the investigation, troubleshooting, and resolution of Tier‑3 production incidents and escalations, contributing to root cause analysis and continuous improvement.