Senior Platform Engineer

ScotiabankDallas, TX

About The Position

The Senior Platform Engineer will be responsible for the building, tuning, managing infrastructure, DevOps, Platform site reliability, monitoring, troubleshooting, enhancing, enabling new features on Data & AI platform(s) as per banks Data & AI strategy. This consists of working with cross functional teams like IAM, Network, Cloud Ops, Security, Client partners etc for integration, process automation, platform enhancement and delivery of new projects.

Requirements

  • 15+ years of IT experience in big organizations operating in various geographies/regulations.
  • 5+ years of hands on experience with Microsoft Azure (networking, security, identity, storage, compute, PaaS).
  • 5+ years with Databricks on Azure (workspaces, jobs/workflows, clusters/SQL warehouses, Unity Catalog governance).
  • 5+ years using Infrastructure as Code (Terraform modules, Terraform Cloud/Enterprise; working knowledge of ARM/Bicep a plus).
  • 5+ years with CI/CD (Azure DevOps, GitHub Actions), including automated testing, security scanning, and policy gates.
  • 5+ years with development/scripting languages (Python, Go optional; plus Bash/PowerShell) for automation and platform tooling.
  • 5+ years with container technologies (Docker, orchestration on AKS or containerized jobs on Databricks/Functions).
  • Strong understanding of Azure networking (VNets, subnets, Private Endpoints, NSGs, UDRs, Azure Firewall), RBAC/PIM, and zero trust principles.
  • In depth knowledge of databases and data platforms: Azure SQL, Cosmos DB, Databricks Lakehouse (Delta Lake, SQL Warehouses), and data integration patterns (Event Hubs, ADLS Gen2).
  • Comprehensive understanding of SDLC and GitOps (branching, environments, code review, release promotion).
  • Experience with config management and automation (Ansible, Bash/PowerShell) and governance via cluster policies and IaC standards.
  • Bachelor’s degree in computer science, Engineering, Mathematics, Management or related field.

Responsibilities

  • Provide clear direction to the team, set goals, and keep the team accountable for their deliverables. Align team goals with the overall direction of the Azure & Databricks Platform roadmap and enterprise standards.
  • Own the technical direction across Azure and Databricks: Azure networking and security architecture (VNets, Private Endpoints, NSGs, route tables, Azure Firewall), Azure Identity & Access Management (RBAC, PIM), and Databricks platform governance (Unity Catalog, workspace configuration, cluster policies). Ensure best practices for reliability, cost, and security are consistently applied.
  • Ensure a high quality of support delivery for platform users; adhere to platform SLAs/SLOs and service objectives
  • Continually improve platform processes and SOPs for efficiency and automation. Design and develop reusable Terraform modules for Azure native resources and Databricks (clusters, SQL warehouses, Unity Catalog objects), enabling consistent, scalable, and automated deployments via Terraform Cloud/Enterprise and CI/CD.
  • Build strong relationships with data engineers, analysts, and platform users. Communicate proactively with stakeholders and cross functional teams (Platform, Security, Cloud Ops, Networking, Data Governance) to align priorities, manage expectations, and drive adoption of platform standards.
  • Troubleshoot and resolve performance issues across Databricks jobs, clusters, SQL warehouses, and Azure dependencies. Implement Azure Monitor and Log Analytics based observability with custom dashboards for cluster/job health, driver/executor metrics, and cost insights. Establish proactive alerting and early issue detection via logs/metrics for Databricks and Azure services.
  • Analyze, triage, and resolve platform issues promptly to achieve SLOs and platform reliability objectives. Drive error budget aware practices, post incident reviews, and resilience engineering (e.g., autoscaling, retry/backoff strategies, policy guardrails).
  • Provide support during major incidents, including after hours support. Lead incident response, communications to users and stakeholders, and root cause analysis with clear action items and follow through.
  • Design, build, and deploy logging/monitoring solutions for early detection and actionable insights. Standardize ingestion to Log Analytics from Databricks (audit logs, cluster events, job runs) and key Azure resources; built dashboards and alert rules to reduce MTTR.
  • Maintain and enhance the Infrastructure & Platform release pipeline using Terraform, Terraform Cloud, Azure DevOps and/or GitHub Actions, with source control in GitHub/Bitbucket and artifact promotion via ACR/Artifacts. Enforce approvals, change windows, and automated checks to ensure safe, repeatable releases.
  • Implement CI/CD for infrastructure and analytics workloads using Terraform, Docker, Azure DevOps/GitHub Actions, and Artifact/Container registries.Automated Terraform plan/apply, Databricks Bundle releases, policy validation, and security scanning to streamline delivery and ensure compliance.
  • Set up Azure Key Vault and HashiCorp Vault for secret management; integrate with Databricks secret scopes and workload identities. Enforce least privilege access via Azure RBAC and rotate credentials per policy.
  • Partner with Microsoft and Databricks support and product teams to fine tune and troubleshoot components, plan upgrades, and adopt new capabilities aligned to roadmap and enterprise controls.
  • Mentor junior engineers in best practices for building, deploying, testing, and supporting services on Azure and Databricks. Promote a culture of automation, documentation, and continuous learning.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service