Senior Manager, Service Reliability

Scotiabank•Toronto, ON

2d•Hybrid

About The Position

Global Banking and Markets Engineering (GBME) is seeking a Senior Manager, Service Reliability to enhance the reliability and stability of capital markets products and analytics platforms. The ideal candidate will possess a strong analytical and engineering background, collaborating with Site Reliability Engineers (SREs), application, infrastructure, and operations teams to ensure swift incident resolution, address root causes, and implement corrective actions. This role involves ensuring stability, reducing production incidents, leading incident response and root cause analysis, enforcing SLOs, owning problem management, identifying opportunities for improved resiliency and availability, and translating technical data into business impact narratives for executive audiences.

Requirements

10+ years of experience in technology operations, reliability engineering, production support, or service management roles.
Demonstrated experience leading problem management and incident reduction initiatives.
Strong understanding of SRE concepts, production operations, and service reliability practices.
Proven people leadership experience, including managing analysts and offshore teams.
Experience operating in high-availability, mission-critical environments (Capital Markets experience strongly preferred).
Strong executive communication skills, including the ability to present KPIs, trends, and risk to senior leadership.
Ability to influence across multiple teams without direct authority.
Experience with ITSM and incident/problem management tooling (e.g., ServiceNow).

Nice To Haves

Capital Markets or financial services technology experience.
Familiarity with regulatory and operational resilience expectations.
Experience driving enterprise-wide operational improvement initiatives.
Strong analytical and data-driven decision-making skills.
Experience working with real-time, high availability and low latency systems

Responsibilities

Ensure stability and drive measurable reductions in production incidents and recurring incidents across GBME.
Lead incident response, root cause analysis, and postmortems; enforce SLOs and reliability practices.
Drive accountability across technology teams to complete corrective and preventative actions.
Proactively identify opportunities to improve resiliency, availability, and operational readiness.
Translate technical data into business impact and risk narratives for executive audiences.

Benefits

Upskilling through online courses, cross-functional development opportunities, and tuition assistance.
Competitive Rewards program including bonus, flexible vacation, personal, sick days, and benefits will start on day one.
Opportunities for community engagement & belonging with our various programs.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume