Responsable – Surveillance des applications et observabilité /Application Monitoring & Observability Lead

McKesson•Montreal, QC

2d•$116,300 - $193,800

About The Position

Lead the definition and implementation of a consistent, enterprise-wide approach to application monitoring, observability, and operational support. This role drives standardized practices, tooling, and proactive operations to improve system reliability, visibility, and performance across all teams.

Requirements

Deep expertise in application support, monitoring, observability, and operations
Proven experience implementing enterprise-wide standards and cross-team practices
Strong understanding of cloud platforms, infrastructure, and enterprise applications
Experience with automation, scripting, and operational optimization
Strong leadership and influencing skills without direct authority
Ability to drive change, challenge existing practices, and improve operational maturity
Results-oriented mindset focused on reliability, performance, and continuous improvement
Degree or equivalent and typically requires 10+ years of relevant experience. Less years required if has relevant Master’s or Doctorate qualifications.

Nice To Haves

Experience with modern observability platforms (e.g., Datadog, Dynatrace, Splunk, New Relic)
Knowledge of SRE practices, SLIs/SLOs, and reliability engineering principles
Experience in large-scale enterprise or highly regulated environments
Familiarity with CI/CD, DevOps, and cloud-native architectures
Prior experience driving enterprise transformations or platform standardization

Responsibilities

Define and implement a unified monitoring and observability strategy across all applications
Standardize tools, metrics, alerting thresholds, and practices across engineering and operations teams
Establish centralized dashboards to provide full visibility into system health and performance
Identify gaps in monitoring coverage and implement improvements
Develop proactive detection mechanisms to identify anomalies and performance degradation
Reduce incidents through preventive, data-driven practices
Define and track key operational metrics such as availability, performance, reliability, and recovery time
Lead root cause analysis and ensure sustainable corrective actions
Drive automation across monitoring, alerting, and operational workflows
Guide development of scripts, automated runbooks, and self-healing capabilities
Improve efficiency by reducing manual intervention
Define and enforce operational standards, governance models, and accountability frameworks
Ensure compliance with security, audit, and performance requirements
Align teams to consistent, scalable operational practices
Act as a cross-functional leader across development, QA, operations, and support teams
Partner with service managers and delivery teams to drive adoption of best practices
Lead organizational change and promote a culture of operational excellence
Deliver executive-level dashboards and insights on system performance
Analyze trends and recommend strategic improvements
Lead continuous improvement initiatives and ensure long-term adoption of best practices