Site Reliability Engineer

KPMG•Toronto, ON

91d•$105,000 - $156,600

About The Position

At KPMG, you’ll join a team of diverse and dedicated problem solvers, connected by a common cause: turning insight into opportunity for clients and communities around the world. The OPS Site Reliability Engineer will be a focal role owning and ensuring the fluent operations of Managed Services offerings in the KPMG production cloud environment. The role will be focusing on driving high reliability into systems by working closely with the development DevOps engineers, security teams, IT-operations teams (Canadian and Global) as well as our advisory client facing teams. The role will be in charge of defining, documenting and operating production environments. Some of the main responsibilities include the creation and setup of new product/service environments, DevOps provisioning, development and adoption of new OPS tools and methodologies, day-to-day governance of the production offerings, continuous improvement of the OPS process, collaboration on pipeline automation with the development DevOps engineers, periodic maintenance, and day-to-day troubleshooting of support / helpdesk / business escalations We are looking for a Devops expert with strong Operations related experience.

Requirements

Bachelors Degree in Computing Science or a related field or equivalent Technical Diploma combined with relevant experience
At least 3 years of experience in IT Operations
Working in large and complex IT environments
Working with UNIX/Linux/BSD systems and/or Windows server systems as an application or database administrator
Proven experience administering large applications in production grade cloud, with emphasis on Azure (other cloud: AWS, GCP)
Experience with configuration management (Chef, Ansible, Puppet)
Experience with proactive governance, monitoring, and a continuous operational improvement processes
At least 3 years of experience with DevOps CI/CD development
Automating builds, releases, and pipelines, with advantage to knowledge with GitHub Actions (other tools: Jenkins, TravisCI, Azure ADO)
Experience with infrastructure as code with, with advantage to knowledge with Terraform (other tools: AWS CloudFormation)
Other software development experience, including scripting with Ruby, Python, Bash, PowerShell, and Java
Experience with Git branching and source code management within enterprise team setup
At least 2 years of experience managing teams in the IT and/or software development space

Responsibilities

Documentation of processes in OPS production environments
Continuous improvement of the OPS production processes and increasing the performance of OPS KPIs, including the development and/or adoption of tools and technologies
Collaborating with the development teams on DevOps Automation development and adoption of pipelines for OPS production environments
Provisioning of pre-prod (with client data) and production environments. Collaborating with security and architecture teams
Continuous proactive governance, and governance processes in production environments
Troubleshooting of support / helpdesk / business escalations
Periodic maintenance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume