Lead Platform Engineer - Global AI Platform

ManulifeToronto, ON
$113,260 - $210,340Hybrid

About The Position

As a Lead Platform Engineer, you will help design, build and deliver the foundational services that enable secure, scalable, and reusable AI and multi-agentic workflow solutions across the organization. You will work across cloud, distributed systems, orchestration frameworks, vector/memory stores, agent workflows, and safety/guardrail layers to ensure the platform is secure, scalable, and enterprise ready.

Requirements

  • 7–10+ years in software engineering; 3+ years leading teams/projects in AI/ML or distributed systems.
  • Strong expertise in Akka (Actors, Streams, Cluster, Typed) and event-driven microservices at scale.
  • Hands-on experience with AI Foundry and AdaptiveML (or equivalent platforms for model lifecycle, orchestration, and continuous learning).
  • Proficiency in Scala or Java (Akka ecosystem), plus Python for ML tooling.
  • Experience with stream processing and data pipelines.
  • Solid MLOps background: model registries, feature stores, CI/CD for ML, containerization (Docker), orchestration (Kubernetes).
  • Cloud proficiency (AWS/Azure), Terraform or IaC, and secrets/IAM.
  • Deep understanding of distributed systems, observability stack and resilience patterns.
  • Strong communication, documentation, and stakeholder management skills.

Nice To Haves

  • Experience with online learning, reinforcement learning, or active learning in production.
  • Knowledge of responsible AI frameworks, model risk management, and fairness/bias assessment.
  • Performance optimization for low-latency inference; GPU/accelerator utilization.
  • Experience in regulated industries (e.g., financial services/insurance) with audit and governance requirements.

Responsibilities

  • Builds and maintains high-performance, fault-tolerant, secure, and scalable AI platform services and abstractions that support diverse AI solutions with automation-first delivery.
  • Designs, builds, and maintains the technology platform's features and infrastructure, including hardware, software, and network components.
  • Build scalable microservices and event-driven pipelines for model training/inference using Akka Streams and Cluster Sharding.
  • Integrate AdaptiveML workflows for continuous/online learning, feature stores, model registries, and A/B experimentation.
  • Implement AI Foundry components for orchestration, feature engineering, model deployment, and governance.
  • Develop reusable reference patterns, inner-source components that meet reliability, security, and compliance standards.
  • Implement shared runtimes for multi agent coordination, state management, memory persistence, and messaging.
  • Design interoperable APIs/SDKs used by data scientists and developers to build agent powered applications.
  • Maintain and improve CI/CD pipelines and developer toolchains for AI services to enable rapid, compliant delivery.
  • Evaluate emerging AI/ML infrastructure capabilities; prototype and introduce tools that improve developer productivity and reliability.
  • Develop and operate scalable backend services supporting high traffic agent interactions, retrieval operations, and real time execution flows.
  • Use cloud native technologies (containers, orchestration, IaC, CI/CD) to deliver reliable, cost-efficient services.
  • Optimize runtime performance across CPU/GPU/accelerator workloads.
  • Monitors and resolves persistent platform issues when surfaced by technical support teams such as bottlenecks, connectivity problems, and system failures.
  • Considers compliance and regulatory requirements throughout the platform lifecycle.
  • Implements security measures, such as access controls, encryptions, and vulnerability assessments when applicable.
  • Partners with architects and business leaders to design and build robust platforms across all Global AI Platform capability layers.

Benefits

  • health, dental, mental health, vision, short- and long-term disability, life and AD&D insurance coverage, adoption/surrogacy and wellness benefits, and employee/family assistance plans.
  • various retirement savings plans (including pension and a global share ownership plan with employer matching contributions) and financial education and counseling resources.
  • generous paid time off program in Canada includes holidays, vacation, personal, and sick days, and we offer the full range of statutory leaves of absence.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service