Production support Engineer

CapgeminiToronto, ON

About The Position

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.Job DescriptionMonitoring and Alerting:Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.Incident Response:Respond to incidents and outages, diagnose problems, and implement solutions to minimize downtime and restore service.Automation:Automate repetitive tasks and processes to improve efficiency and reduce manual effort.Performance Optimization:Identify and address performance bottlenecks to ensure systems run efficiently and effectively.Infrastructure Management:Manage and maintain the underlying infrastructure, including servers, networks, and cloud resources.Capacity Planning:Plan for future capacity needs to ensure systems can handle anticipated workloads.Release Engineering:Develop and maintain processes for deploying software updates and releases.Collaboration:Work closely with developers, operations teams, and other stakeholders to ensure system reliability and availability.Documentation:Maintain clear and concise documentation of systems, processes, and procedures.Continuous Improvement:Identify areas for improvement and implement changes to enhance system reliability and performance.

Requirements

  • Cloud Platform (OCP, Microsoft Azure)
  • 8+ Years experience in production support handling Prod incidents.
  • Excellent knowledge of OCP and windows environment.
  • Monitoring tools ( Dynatrace, Splunk)
  • Operating System (Windows, Linux)
  • Scripting (Shell Scripting, Python, Power Shell)
  • Ansible
  • Database (MySQL, Oracle, SQL database management)
  • Container Services (Kubernetes)

Responsibilities

  • Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.
  • Respond to incidents and outages, diagnose problems, and implement solutions to minimize downtime and restore service.
  • Automate repetitive tasks and processes to improve efficiency and reduce manual effort.
  • Identify and address performance bottlenecks to ensure systems run efficiently and effectively.
  • Manage and maintain the underlying infrastructure, including servers, networks, and cloud resources.
  • Plan for future capacity needs to ensure systems can handle anticipated workloads.
  • Develop and maintain processes for deploying software updates and releases.
  • Work closely with developers, operations teams, and other stakeholders to ensure system reliability and availability.
  • Maintain clear and concise documentation of systems, processes, and procedures.
  • Identify areas for improvement and implement changes to enhance system reliability and performance.

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service