Senior Platform Engineer – Data & AI

EquinixToronto, ON
$131,000 - $181,000Onsite

About The Position

We are seeking a highly skilled Senior Platform Engineer – Data & AI to architect and build next-generation AI-native and Agentic platforms that power enterprise-scale data, automation, and intelligent systems. This role goes beyond traditional data platforms to focus on Agentic AI ecosystems, including multi-agent orchestration, agent lifecycle management, agent communication protocols, and AI-driven platform automation. You will design and operate a unified platform that supports: Data pipelines and real-time streaming APIs and microservices GenAI and LLM-powered applications Agentic workflows and multi-agent systems Working closely with AI/ML engineers, platform teams, SRE, and product teams, you will help build a scalable, observable, and governed AI platform on Google Cloud, leveraging automation, IaC, and modern cloud-native patterns.

Requirements

  • 8–12 years of experience in Platform Engineering, Data Engineering, Cloud Architecture, or AI Platform Engineering
  • Proven experience building enterprise-scale data and AI platforms
  • Strong programming expertise in Java, Python, Full-Stack and SQL
  • Experience building microservices and API-driven architectures
  • Deep understanding of distributed systems and cloud-native design
  • Strong experience with Google Cloud Platform (GCP) (mandatory)
  • Hands-on experience with: Kubernetes and containerized workloads, Terraform and Infrastructure-as-Code, CI/CD pipelines and GitOps
  • Experience with Kafka, Pub/Sub, Spark, Flink, or similar systems
  • Strong background in real-time and batch data processing
  • Hands-on experience with: LLM frameworks and APIs, Multi-agent orchestration frameworks (CrewAI, LangGraph, AutoGen, etc.), RAG pipelines and vector databases
  • Experience building or working with: Agent Gateway architectures, A2A communication models, MCP or context-sharing frameworks, Agent Development Kits (ADKs)
  • Experience building full stack applications with modern frontend frameworks (React, Angular, Vue.js).
  • Strong understanding of REST/GraphQL APIs and UI integration patterns.
  • Experience with real-time UI updates using WebSockets or streaming architectures.
  • Familiarity with design systems, UX principles, and responsive design.
  • Experience building platform dashboards, developer portals, or observability UIs.
  • Experience with observability tools: Prometheus, Grafana, OpenTelemetry
  • Strong debugging and system analysis skills.
  • Familiarity with AI/LLM observability and evaluation frameworks.
  • Experience with: Data catalogs and metadata platforms, Data quality and lineage frameworks, Semantic modeling and data governance

Nice To Haves

  • Experience with Vertex AI, MLflow, Kubeflow, or ML platforms.
  • Prior implementation of data mesh or data fabric architectures.
  • Experience with Looker Modeler / LookML or semantic layers.
  • Exposure to AI safety, governance, and responsible AI practices.
  • Experience building enterprise AI/Agentic platforms at scale.

Responsibilities

  • Architect and build cloud-native platforms on Google Cloud (GCP) supporting data, AI, and agentic workloads
  • Design event-driven architectures using Apache Kafka, Google Pub/Sub, or equivalent systems
  • Build scalable microservices and APIs using modern frameworks (e.g., Java, Spring Boot)
  • Develop and manage real-time and batch data pipelines using Airflow, Dataform, Dataflow, Spark, or similar tools
  • Implement Infrastructure-as-Code (IaC) using Terraform and Kubernetes for scalable, repeatable deployments
  • Enable platform automation using CI/CD, GitOps, and self-service frameworks
  • Ensure platform scalability, reliability, and cost efficiency
  • Design and build Agentic Platforms that support: Agent lifecycle management, Task orchestration, Context and memory handling
  • Develop and orchestrate multi-agent systems using frameworks such as CrewAI, LangGraph, AutoGen, or equivalent
  • Implement agent communication and coordination patterns across distributed systems
  • Build and integrate: Agent Gateway for managing agent interactions and routing, A2A (Agent-to-Agent) communication protocols, MCP (Model Context Protocol) or equivalent for context sharing and orchestration, ADK (Agent Development Kits) or internal frameworks for rapid agent development
  • Enable use cases such as: Autonomous pipeline monitoring and remediation, AI-assisted platform operations, Intelligent workflow automation, Code and data pipeline generation
  • Integrate LLMs and GenAI services (e.g., OpenAI, Gemini, Claude) into platform workflows.
  • Build and support: RAG pipelines and retrieval systems, Vector search and embedding architectures (Weaviate, Pinecone, FAISS)
  • Enable AI-driven automation for: Platform operations, Data quality monitoring, Incident analysis and resolution
  • Develop reusable AI platform services and APIs for enterprise consumption.
  • Design and implement Agent Observability frameworks, including: Agent execution tracing, Decision tracking and explainability, Latency and performance monitoring, Failure and retry analysis
  • Integrate observability using tools like: OpenTelemetry, Prometheus, Grafana, AI/LLM observability tools (e.g., prompt tracing, evaluation frameworks)
  • Enable end-to-end observability across data pipelines, APIs, and agent workflows.
  • Lead initiatives in: Data modeling and semantic layer design, Data cataloging and metadata management, Data quality and lineage tracking
  • Implement governance frameworks using tools such as DataHub, Collibra, or equivalent.
  • Support data mesh and data fabric architectures for federated data ownership.
  • Build automation-first platforms leveraging: AI-driven workflows, Self-healing systems, Event-driven automation
  • Use GenAI to: Automate operational tasks, Generate platform configurations and code, Enhance developer productivity
  • Collaborate with SRE and Production Support teams to improve: Reliability, Incident response, Operational efficiency
  • Develop platform SDKs, CLIs, and reusable blueprints.
  • Enable self-service platform capabilities for engineering teams.
  • Standardize best practices for: APIs, Data pipelines, Agent development
  • Mentor engineers and promote a culture of innovation and continuous learning.

Benefits

  • Employee Assistance Program
  • Healthcare coverage that is designed to complement the provincial healthcare system
  • Life, disability and optional benefit plans
  • Defined Contribution Pension Plan (DCPP)
  • Group Retirement Savings Plan (RRSP)
  • Tax-Free Savings Plan (TSFA)
  • Vacation and personal time
  • Various paid holidays
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service