Principal Platform Engineer

Scotiabank•Toronto, ON

1d•Onsite

About The Position

The Principal Platform Engineer will play a critical role within the Enterprise Data & AI Technology organization - one of Scotiabank’s most significant enterprise wide strategic initiatives. This organization drives data enabled decision making, AI innovation, and technology modernization across the Bank. The Principal Platform Engineer will be responsible for defining and owning the technical strategy, architecture, and operational excellence for the Data & AI platform(s) in alignment with the Bank’s Data & AI strategy. This role sets platform standards and guardrails, leads reliability and security improvements, and drives automation and enablement at scale. The Principal Platform Engineer partners across IAM, Network, Cloud Ops, Security, Data Governance, and client delivery teams to influence enterprise roadmaps, manage risk, and deliver new capabilities and modernization initiatives.

Requirements

Around 10 years of progressive IT experience in large, regulated enterprises operating across multiple geographies.
7+ years of hands‑on experience with Microsoft Azure, including architecture and deep expertise in networking, security, identity, storage, compute, and PaaS services.
5+ years of hands‑on Databricks on Azure experience (workspaces, jobs/workflows, clusters/SQL warehouses, Unity Catalog governance), including platform standards and guardrails.
7+ years using Infrastructure as Code (Terraform modules, Terraform Cloud/Enterprise; working knowledge of ARM/Bicep a plus), including establishing IaC standards and reusable blueprints.
7+ years with CI/CD and GitOps practices (Azure DevOps, GitHub Actions), including automated testing, security scanning, policy gates, and release/change controls.
Strong development and automation skills (Python required; Bash/PowerShell; Go optional) used to build platform tooling, self‑service enablement, and operational automation.
Proven experience designing secure, enterprise-grade Azure network and identity architectures (VNets, Private Endpoints, NSGs, UDRs, Azure Firewall, RBAC/PIM, workload identity) using zero‑trust principles.
Deep understanding of data platforms and integration patterns: Azure SQL, Cosmos DB, Databricks Lakehouse (Delta Lake, SQL Warehouses), ADLS Gen2, Event Hubs, and enterprise data governance controls.
Demonstrated ownership of SRE and incident management practices: SLOs/error budgets, on‑call readiness, major incident leadership, post‑incident reviews, and reliability improvements delivered through measurable outcomes.
Experience establishing observability standards (Azure Monitor, Log Analytics, dashboards, alerting) and driving performance/cost optimization at scale.
Strong stakeholder management and cross‑functional leadership skills; able to influence enterprise roadmaps, align priorities, and communicate tradeoffs to technical and non‑technical audiences.
Bachelor’s degree in Computer Science, Engineering, Mathematics, Management or a related field (or equivalent practical experience).

Responsibilities

Define the platform technical strategy and multi‑quarter roadmap for Azure & Databricks, aligning to enterprise architecture, security, and data governance standards. Identify capability gaps, prioritize investments, and drive adoption across delivery teams.
Own end‑to‑end architecture for Azure identity and access (RBAC, PIM, workload identities), and Databricks governance (Unity Catalog, workspace configuration, cluster policies). Establish reference architectures, design patterns, and reusable blueprints; ensure designs are compliant, resilient, and cost‑effective.
Define and evolve the platform operating model (intake, onboarding, support tiers, change management, controls evidence), including SLAs/SLOs, service objectives, and runbooks. Drive consistency across environments and delivery streams.
Establish error‑budget aware practices, incident severity models, and resilience engineering (autoscaling, retry/backoff strategies, capacity planning). Lead post‑incident reviews, ensure corrective actions are delivered, and continuously reduce toil and MTTR.
Design, build, and standardize observability across Databricks and Azure using Azure Monitor and Log Analytics. Deliver actionable dashboards and alerting for cluster/job health, audit events, performance, and cost insights; enable proactive detection and capacity management.
Design and develop reusable Terraform modules for Azure and Databricks (clusters, SQL warehouses, Unity Catalog objects), enabling consistent, scalable, and automated deployments via Terraform Cloud/Enterprise and CI/CD. Set IaC standards, review practices, and policy-as-code controls.
Own the Infrastructure & Platform release and change management approach, including approvals, change windows, automated validations, and rollbacks. Partner with Risk, Security, and Compliance to ensure auditability and control adherence.
Lead complex troubleshooting across Databricks jobs, clusters, SQL warehouses, and Azure dependencies. Drive performance tuning, capacity planning, and cost optimization (tagging/chargeback, cluster policies, autoscaling, right‑sizing) in partnership with Finance/Cloud Ops.
Build strong relationships with platform users and delivery teams. Communicate platform direction, constraints, and best practices; influence cross‑functional stakeholders (Platform, Security, Cloud Ops, Networking, Data Governance) to align priorities and accelerate adoption of standards.
Establish secure patterns for secret management using Azure Key Vault and HashiCorp Vault; integrate with Databricks secret scopes and workload identities. Enforce least‑privilege access, credential rotation, and secure-by-default platform configurations.
Partner with Microsoft and Databricks to plan upgrades, troubleshoot complex issues, evaluate new capabilities, and influence product/enterprise roadmaps while maintaining control compliance.
Provide hands‑on technical leadership across squads; mentor engineers on architecture, IaC, CI/CD, incident response, and operational excellence. Raise engineering standards through design reviews, documentation, and continuous improvement.

Benefits

Upskilling through online courses, cross-functional development opportunities, and tuition assistance.
Competitive Rewards program including bonus, flexible vacation, personal, sick days and benefits will start on day one.
Free tea & coffee, universal washrooms, and lots of space for team collaboration.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume