Senior AI Platform Engineer

Recrute Action•Toronto, ON

2d•Hybrid

About The Position

Build and support a global AI platform in the insurance industry using Azure cloud infrastructure, AI tools and services, and DevOps technologies. This hybrid Toronto-based role focuses on platform engineering, automation, and operational support within a rapidly evolving AI environment supporting enterprise-scale systems.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field.
5–7 years of experience in backend, platform, or cloud systems engineering, including experience using Jenkins, GitHub, and Terraform.
Proficiency with Python and Java, Scala, or TypeScript or similar languages for building backend services and automation, including Java understanding.
Hands-on experience with Azure cloud infrastructure, including Azure Kubernetes, containers, and CI/CD.
Understanding of AI tools and services, including LLM systems, retrieval architectures, embeddings, vector stores, prompt or tool orchestration fundamentals, and AI/ML operations including MLOps exposure.
Strong grasp of API design, asynchronous workflows, concurrency, and system reliability.
Familiarity with security, governance, and compliance concepts related to AI or data systems.
DevOps skills including GitHub, Jenkins, and Terraform.
Ability to collaborate across global teams, translate business problems into platform capabilities, and manage stakeholders effectively.
Strong communication skills and ability to support day-to-day AI platform operations.
Ability to work in an evolving environment, help shape foundational processes, tooling, and standards, and take ownership in a fast-moving environment.
Eagerness to learn and grow with new technologies within the platform and AI ecosystem.
Ability to support a global program, including after-hours coverage across time zones.

Responsibilities

Build and operate AI platform services and abstractions that support diverse AI use cases with automation-first delivery.
Develop reusable reference patterns and inner-source components that meet reliability, security, and compliance standards.
Implement shared runtimes for multi-agent coordination, state management, memory persistence, and messaging.
Design interoperable APIs and SDKs used by data scientists and developers to build agent-powered applications.
Maintain and improve CI/CD pipelines and developer toolchains for AI services.
Evaluate emerging AI and ML infrastructure capabilities and introduce tools to improve developer productivity and reliability.
Develop and operate scalable backend services supporting high-traffic agent interactions, retrieval operations, and real-time execution flows.
Use cloud-native technologies including containers, orchestration, infrastructure as code, and CI/CD to deliver reliable and cost-efficient services.
Optimize runtime performance across CPU, GPU, and accelerator workloads.
Develop standardized retrieval frameworks including search, embeddings, and knowledge connectors.
Build and optimize short-term and long-term memory and episodic state abstractions for agent workflows.
Integrate structured and unstructured data sources through unified connectors and retrieval bridges.
Build tool interfaces enabling agents to interact with enterprise systems, APIs, databases, and automations.
Create reusable patterns for tool definitions, schema validation, safe execution, rate limiting, and auditability.
Collaborate with regional teams to onboard systems and workflows into the global ecosystem.
Build and support AI governance platform and service requirements.
Develop observability capabilities including traces, logs, action tracking, feedback loops, and performance metrics.
Provide mechanisms for feedback, oversight, and evaluation of agent behavior.
Build templates, scaffolding, and CLI tools to support development of AI-powered applications.
Collaborate with global engineering, security, and governance teams to support regulatory and data residency needs.
Mentor engineering and data science teams on platform capabilities and design patterns.
Contribute to documentation, playbooks, and enablement resources.