Site Reliability Engineer

ScotiabankToronto, ON
Onsite

About The Position

The Site Reliability Engineer will be working in a cross functional technology team responsible for the bank's Customer data and will contribute to the overall success of the team ensuring specific individual goals, plans, initiatives are executed / delivered in support of the team’s business strategies and objectives. Ensures all activities conducted are following governing regulations, internal policies, and procedures. The incumbent should be a self-starter and be able to work independently with little or no supervision. They should have strong communication skills, a client focused mindset, and take accountability and ownership of tasks.

Requirements

  • 3+ years’ experience in any ETL platform like iWay, Informatica, Talend, DataStage etc.
  • 3+ years of experience in Application Support, Log Monitoring, Debugging and Incident Management process.
  • 3+ years of Unix Shell Scripting and prior experience with Java based applications.
  • Highly analytical and good understanding of databases, technical architecture, and experience in working with SQL.
  • Experience in Application Support, Log Monitoring, Debugging and Incident Management process
  • Undergraduate Degree in Computer Science, Computer engineering or Technical equivlant
  • Excellent communication skills (verbal/written/presentation).

Responsibilities

  • Contributes to a customer focused culture to deepen client relationships and leverage broader Bank relationships, systems, and knowledge.
  • Understand how the Bank’s risk appetite and risk culture should be considered in day-to-day activities and decisions.
  • Actively pursues effective and efficient operations of his/her respective areas in accordance with Scotiabank’s Values, its Code of Conduct and the Global Sales Principles, while ensuring the adequacy, adherence to and effectiveness of day-to-day business controls to meet obligations with respect to operational, compliance, AML/ATF/sanctions and conduct risk.
  • Drive reliability engineering strategy and platform standardization across teams.
  • Introduce chaos engineering practices to test system resilience.
  • Lead major incident management and stakeholder communication.
  • Mentor junior engineers and promote SRE best practices.
  • Ensure system reliability and uptime by designing, implementing, and maintaining highly available and fault-tolerant systems.
  • Monitor production systems using observability tools (e.g., Dynatrace, Grafana, Splunk) to proactively detect and resolve issues.
  • Develop and maintain automation for deployment, scaling, and incident response using scripts and Infrastructure as Code (IaC).
  • Manage incident response processes, including root cause analysis (RCA), postmortems, and continuous improvement of system resilience.
  • Improve observability and logging standards to enhance troubleshooting and system insights.
  • Experience coding in a professional environment, taking requirements from concept to production use.
  • Ability to work collaboratively and communicate clearly and concisely with both technical and non-technical audiences.
  • Troubleshoot production incidents, job failures, and provide support for production applications.
  • Analyze and resolve incident tickets assigned to the group based on severity and priority and identify the root cause for resolution.
  • Ensure incident and change management processes are executed as mandated, partnering with various internal teams.
  • Provide Release Management support including post-release health checks and monitoring of applications and ensure timely communications to upstream and downstream teams.
  • Develop, document and standardize plans and processes for preventive maintenance steps to ensure system stability and availability.
  • Provide after-hours support via an on-call pager on a rotational basis for production incidents, application releases during a maintenance windows and other maintenance activities.
  • Lead or support on-call rotations to maintain 24/7 service reliability and quick issue resolution.
  • Champions a high-performance environment and contributes to an inclusive work environment.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service