DevOps Solutions Architect

TeknionToronto, ON
CA$110,000 - CA$130,000

About The Position

The DevOps Solutions Architect is a member of the IT Enterprise Applications team, focusing on DevOps, SRE, and AI Ops. DevOps focuses on the end-to-end application lifecycle. SRE focuses on delivery and the stability of the production environment. AI Ops is centered on the deployment, oversight, and monitoring of AI-specific elements. You are responsible for the smooth operation of Teknion’s Enterprise Applications infrastructure. You have an essential role in integrating the various project solutions within the existing application and infrastructure. You will interface directly with your senior technology leaders to transform business and technology capabilities. You will be a dedicated contributor to senior leaders as they define a target state, roadmaps, and identify new and emerging technologies that will transform and optimize the business.

Requirements

  • Bachelor’s degree in information technology, software engineering, computer science, or related
  • Proven experience in engineering and software architecture design.
  • Must be self-motivated and driven. Strong ability to work with internal resources and vendors
  • Experience in managing Virtual Private Clouds (OSI Transport layer and above)
  • Experience with cloud services like EC2, S3, Azure VMs, Kubernetes Engine, etc.
  • Understanding of cloud networking, security, and infrastructure as code.
  • Expertise in Docker for containerizing applications.
  • Experience with Kubernetes or other orchestration tools for managing containerized workloads.
  • Familiarity with tools like Ansible for automating system configurations.
  • Expertise in CI/CD tools - Jenkins, GitHub Actions CI/CD
  • Proficiency in scripting languages like Python, Bash, or PowerShell.
  • Understanding of programming concepts for building automation tools.
  • Strong understanding of RHEL 8 (& above) and/or Windows Server.
  • Networking knowledge, including TCP/IP, DNS, and load balancing.
  • Experience in Okta, Active Directory, Azure Active Directory
  • Experience with monitoring tools like Prometheus, Grafana, New Relic or Datadog.
  • Familiarity with logging tools like ELK stack (Elasticsearch, Logstash, Kibana) or Splunk.
  • Git & GitHub
  • Cloudflare WAF, Cloudflare Reverse Proxy, Cloudflare Tunnel
  • Postgres, OpenEdge, No SQL, MongoDB, SQLServer
  • Fundamental understanding of Generative AI principles, foundation models (LLMs), tokenization, and basic prompt engineering lifecycle concepts.
  • Basic familiarity with configuring vector databases or semantic caching mechanisms alongside standard database systems like Postgres, NoSQL, or MongoDB

Responsibilities

  • Design and implement end to end highly scalable and resilient solutions for infrastructure and application services with Teknion’s hybrid clouds
  • Design, implement, and manage Continuous Integration / Continuous Delivery (CI/CD) pipelines to automate the build, test, and deployment processes
  • Automate the configuration and management of systems and applications
  • Design, implement and manage Source Code Control (Github); software components and build artifacts in a repository manager integrating into CI/CD pipelines
  • Test Automations driven by Test-Driven Development strategies in partnership with development leading to increase in code quality and confidence
  • Manage application security posture protecting APIs & Web Applications at the edge
  • Manage the deployment and configuration of application based definitions
  • Integrate WAF into CI/CD pipelines to ensure security is built into development process
  • Align & implement WAF policies with industry & organization standards
  • Implement and manage Reverse Proxy and Web Application Firewalls (Cloudflare WAF) to provide unified application security posture protecting APIs & Web Applications at the edge; reduces client-side risks
  • Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management. Address Common Vulnerabilities and Exposures (CVE) as per established procedures
  • Design and implement Development, QA, UAT and Production application & database environments
  • Ensure application environments, tools & approved 3rd party components are kept up to date as per established patching & update procedures. Liaise with vendors to manage the monthly patching exercises
  • Ensure unplanned downtime is kept to a minimum (preferred 0.00%)
  • Implement automated processes wherever possible with continuous modernization and upgrade of existing processes / scripts
  • Manage, Configure and monitor applications related IAM actions
  • Responding to and mitigating production incidents.
  • Conducting post-incident reviews (postmortems) to identify root causes and prevent future occurrences.
  • Designing and implementing comprehensive monitoring systems to track system health and performance.
  • Setting up effective alerting mechanisms to notify teams of potential issues.
  • Participate in code reviews, security audits, and performance testing to maintain the integrity of Cloud and Hybrid solutions
  • Managing and automating the deployment of software changes.
  • Implementing safe deployment practices, such as canary releases and blue-green deployments.
  • Participate & contribute in IT / Cyber Security Change Advisory Board meetings
  • Drives continuous technology transformation to minimize technical debt
  • Provides architecture direction for developers recognizing custom and standard technical frameworks, GRC (Governance, Risk & Compliance) audit policies and procedures including PII (Personally Identifiable Information) and CUI (Controlled Unclassified Information)
  • Participates in defining target state technology architecture and roadmaps & ensure alignment of initiatives
  • Work closely with cross-functional application & infrastructure teams to produce comprehensive end-to-end solution opportunities
  • Design and implement monitoring dashboards within APM tools (Prometheus, Grafana, Datadog, or New Relic) to track AI-specific metrics such as API latency, token utilization, and foundational model error rates.
  • Set up cost-tracking alerts to monitor the consumption of Generative AI resources and prevent budget overruns in Development, QA, UAT, and Production environments.
  • Provide architectural guidelines for developers to ensure AI applications strictly adhere to GRC audit policies, specifically blocking the leakage of Personally Identifiable Information (PII) and Controlled Unclassified Information (CUI) into public AI training sets.
  • Maintain accurate Standard Operating Procedures (SOPs) detailing the failover and recovery mechanisms for AI-driven system capabilities.
  • Stay up-to-date with the latest technologies and security trends to ensure our solutions remain innovative, secure, and cost-efficient
  • Define and maintain documentation of architectural solutions and procedures (Standard Operating Procedures)
  • Configure and manage Web Application Firewalls (Cloudflare WAF) and API gateways to safeguard Generative AI endpoints from emerging threats like prompt injection and data exfiltration.
  • Integrate security guardrails into the development process to automatically scan and intercept unsafe data payloads sent to external or internal AI foundation models.
  • Architect and implement AIOps platforms to ingest, aggregate, and correlate telemetry data
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service