Sr. Machine Learning Ops Engineer

McKessonMississauga, ON
CA$99,100 - CA$132,100Hybrid

About The Position

Join McKesson’s growing AI/ML team and play a critical role in operationalizing machine learning and Generative AI solutions at scale. This role focuses on deploying, standardizing, and maintaining production-ready ML and agentic AI systems—enabling consistent, reliable, and optimized delivery of data science innovations that support McKesson’s AIM28 strategic initiatives.

Requirements

  • Strong experience deploying ML models into production environments
  • Hands-on expertise with CI/CD pipelines, monitoring, and production ML systems
  • Experience with GenAI or agentic AI frameworks (LangChain, Semantic Kernel, etc.)
  • Knowledge of model observability, drift detection, and operational support
  • Experience working in scaling or early-stage ML environments
  • Proficiency with cloud platforms (AWS, Azure, or GCP)
  • Strong cross-functional collaboration skills (Data Science, Product, Architecture)
  • Ability to drive standardization, automation, and platform maturity
  • Focus on reliability, scalability, and optimization
  • Degree or equivalent and typically requires 7+ years of relevant experience.

Nice To Haves

  • Experience with Databricks ecosystem (e.g., Databricks Genie)
  • Familiarity with LangChain, LangGraph, or Microsoft Semantic Kernel
  • Exposure to GenAI cost optimization / FinOps practices
  • Experience implementing secure enterprise applications (e.g., Okta)
  • Experience in healthcare or regulated environments
  • Experience scaling ML/AI capabilities from experimentation to production maturity

Responsibilities

  • Lead deployment and operationalization of ML models and GenAI/agentic solutions, ensuring scalability, reliability, and performance
  • Partner with Data Scientists to identify and automate high-impact model use cases, building end-to-end pipelines (CI/CD, monitoring, alerting)
  • Define and enforce standardized deployment patterns and runbooks across teams
  • Own KTLO (keep-the-lights-on) operations for ML and GenAI systems including health monitoring, logging, and performance tracking
  • Design and implement pipelines for batch, real-time, and event-driven inference
  • Establish observability frameworks (monitoring, logging, lineage, alerting)
  • Enable deployment of agentic AI solutions using tools such as LangChain, LangGraph, Semantic Kernel, and Databricks tools
  • Ensure secure deployment of applications with proper access controls (e.g., Okta integration)
  • Drive cost and performance optimization across ML and GenAI workloads
  • Partner with architecture, compliance, governance, and legal teams to meet enterprise standards
  • Conduct ongoing research into emerging tools and technologies to improve deployment practices
  • Guide and influence architectural decisions while maintaining clear separation between platform and deployment ownership

Benefits

  • competitive compensation package
  • annual bonus
  • long-term incentive opportunities
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service