Senior DevOps Engineer – Google Cloud Platform

ELITS•Montreal, QC

About The Position

As a Senior DevOps Engineer (GCP), you will be a key member of our platform team, responsible for designing, building and operating scalable, cloud-native infrastructure and tooling on Google Cloud Platform. You will work closely with development, data and operations teams to ensure reliable streaming, high availability and smooth deployments in production. You will take ownership of our GCP environments (networking, security, Kubernetes, observability and CI/CD) and help drive best practices for reliability, performance and cost efficiency.

Requirements

7+ years of experience in DevOps, SRE, Platform Engineering or similar roles, including strong hands-on experience running production workloads in cloud environments, ideally on Google Cloud Platform.
Strong hands-on experience with streaming technologies and real-time data processing (for example, Apache Kafka, Google Pub/Sub, Kinesis, Pulsar or equivalent).
Solid background in distributed systems: microservices, event-driven architectures, scalability and fault tolerance.
Strong understanding of hardware and infrastructure concepts (servers, networking, storage) and experience with on-prem or hybrid environments integrated with GCP.
Deep knowledge of Linux/Unix operating systems, system internals, performance and troubleshooting.
Extensive experience with cloud-native technologies: Containers and orchestration: Docker, Kubernetes (GKE in particular; EKS/AKS or other managed K8s is a plus).
Infrastructure as Code: Terraform, Helm (and/or similar tools) targeting GCP resources.
CI/CD pipelines: Cloud Build, GitHub Actions, Jenkins, Argo CD or equivalent.
Observability: monitoring, logging and alerting using tools such as Cloud Monitoring/Logging, Prometheus, Grafana, ELK/EFK.
Strong hands-on experience with Google Cloud Platform services, for example: GKE, Compute Engine, Cloud Run, Cloud Storage, Cloud SQL/BigQuery, VPC networking and IAM.
Good understanding of networking (VPC, VPN, load balancing, DNS, certificates, IPsec, private connectivity).
Experience with agile ways of working and tools such as JIRA and Git.
Strong debugging and troubleshooting abilities across multiple layers (application, infrastructure, network).
Ability to understand users’ technical issues and provide clear, pragmatic recommendations.
Fluent in English (spoken and written).

Nice To Haves

Google Cloud certifications such as Professional Cloud DevOps Engineer, Associate Cloud Engineer or Professional Cloud Architect.
Experience with Java or another backend language (for example, Go, Python) in a DevOps/platform context.
Experience with system testing or test automation for distributed systems and APIs.
Experience with security best practices in cloud-native environments (secrets management, hardening, RBAC, IAM, policy-as-code) on GCP.
Experience with Landing Zone or platform engineering approaches (for example, standardized GCP project structures, shared services, golden paths).

Responsibilities

Design, deploy and operate containerized microservices and distributed systems on GCP, primarily using GKE (Google Kubernetes Engine).
Build and maintain CI/CD pipelines (for example, Cloud Build, GitHub Actions, Argo CD) to enable frequent, reliable releases and automated testing into GCP environments.
Implement and manage real-time streaming data platforms (for example, Kafka on GCP, Pub/Sub or similar technologies) for low-latency, high-throughput workloads.
Design and operate GCP infrastructure with a strong focus on reliability, performance, security and cost-efficiency (for example, leveraging Compute Engine, Cloud Run, Cloud Storage, Cloud SQL).
Own infrastructure as code (IaC) for GCP using tools such as Terraform and Helm for repeatable, auditable environments and standardized landing zones.
Configure and operate observability on GCP (Cloud Monitoring, Cloud Logging, Prometheus, Grafana) and lead incident response, troubleshooting and performance tuning on Linux-based systems and containers.
Collaborate with development teams to improve operability, observability and resilience of services (SRE mindset: SLIs/SLOs, error budgets, post-mortems).
Document architectures, runbooks and operational procedures, and contribute to continuous improvement of processes, tooling and platform standards.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume