Site Reliability Engineer

Scotiabank•Toronto, ON

7d•Onsite

About The Position

The Site Reliability Engineer will be working in a cross functional technology team responsible for the bank's Customer data and will contribute to the overall success of the team ensuring specific individual goals, plans, initiatives are executed / delivered in support of the team’s business strategies and objectives. Ensures all activities conducted are following governing regulations, internal policies, and procedures. The incumbent should be a self-starter and be able to work independently with little or no supervision. They should have strong communication skills, a client focused mindset, and take accountability and ownership of tasks.

Requirements

3+ years’ experience in any ETL platform like iWay, Informatica, Talend, DataStage etc.
3+ years of experience in Application Support, Log Monitoring, Debugging and Incident Management process.
3+ years of Unix Shell Scripting and prior experience with Java based applications.
Highly analytical and good understanding of databases, technical architecture, and experience in working with SQL.
Experience in Application Support, Log Monitoring, Debugging and Incident Management process
Undergraduate Degree in Computer Science, Computer engineering or Technical equivlant
Excellent communication skills (verbal/written/presentation).

Responsibilities

Contributes to a customer focused culture to deepen client relationships and leverage broader Bank relationships, systems, and knowledge.
Understand how the Bank’s risk appetite and risk culture should be considered in day-to-day activities and decisions.
Actively pursues effective and efficient operations of his/her respective areas in accordance with Scotiabank’s Values, its Code of Conduct and the Global Sales Principles, while ensuring the adequacy, adherence to and effectiveness of day-to-day business controls to meet obligations with respect to operational, compliance, AML/ATF/sanctions and conduct risk.
Drive reliability engineering strategy and platform standardization across teams.
Introduce chaos engineering practices to test system resilience.
Lead major incident management and stakeholder communication.
Mentor junior engineers and promote SRE best practices.
Ensure system reliability and uptime by designing, implementing, and maintaining highly available and fault-tolerant systems.
Monitor production systems using observability tools (e.g., Dynatrace, Grafana, Splunk) to proactively detect and resolve issues.
Develop and maintain automation for deployment, scaling, and incident response using scripts and Infrastructure as Code (IaC).
Manage incident response processes, including root cause analysis (RCA), postmortems, and continuous improvement of system resilience.
Improve observability and logging standards to enhance troubleshooting and system insights.
Experience coding in a professional environment, taking requirements from concept to production use.
Ability to work collaboratively and communicate clearly and concisely with both technical and non-technical audiences.
Troubleshoot production incidents, job failures, and provide support for production applications.
Analyze and resolve incident tickets assigned to the group based on severity and priority and identify the root cause for resolution.
Ensure incident and change management processes are executed as mandated, partnering with various internal teams.
Provide Release Management support including post-release health checks and monitoring of applications and ensure timely communications to upstream and downstream teams.
Develop, document and standardize plans and processes for preventive maintenance steps to ensure system stability and availability.
Provide after-hours support via an on-call pager on a rotational basis for production incidents, application releases during a maintenance windows and other maintenance activities.
Lead or support on-call rotations to maintain 24/7 service reliability and quick issue resolution.
Champions a high-performance environment and contributes to an inclusive work environment.