Senior Site Reliability Engineer - GCP & Container Platforms

Wells Fargo BankCharlotte, NC
2dHybrid

About The Position

Overview We are seeking a Senior Site Reliability Engineer (SRE) to help develop our platform operations across Windows, Linux, and cloud-native environments. This role is central to our transformation from app-specific support to platform-wide reliability engineering. You will bring deep expertise in Google Cloud Platform (GCP), container orchestration, and automation, enabling scalable, secure, and resilient infrastructure that supports diverse applications across our enterprise. Key Responsibilities Platform Reliability & Cloud Engineering Ensure high availability, performance, and security of production systems across Windows, Linux, and GCP environments. Engineer and support containerized workloads using Kubernetes (GKE) and Docker, enabling scalable microservices architectures. Lead infrastructure provisioning and configuration using Terraform, Ansible, and GCP-native tools. Automation & Observability Develop automation scripts and pipelines to eliminate manual toil and accelerate incident response. Implement observability frameworks using SLIs/SLOs, Prometheus, Grafana, and GCP Operations Suite. Drive proactive monitoring, alerting, and telemetry across hybrid environments. Incident Management & Resilience Lead incident response, root cause analysis, and postmortems. Build self-healing systems and automated remediation workflows using GCP-native services and scripting. Security & Compliance Collaborate with InfoSec to enforce hardening standards, manage vulnerabilities, and support compliance initiatives. Integrate security into CI/CD pipelines and container platforms using IAM, encryption, and policy enforcement. Collaboration & Enablement Partner with developers, application owners, and infrastructure teams to deliver reliable, cloud-native platforms. Document configurations, runbooks, and operational procedures to enable cross-team reuse and transparency.

Requirements

  • 4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4 + years of experience in Windows Server administration and production support.
  • Strong scripting skills in PowerShell, Python, or Shell.
  • Hands-on experience with GCP services, including GKE, IAM, Cloud Functions, and Cloud Monitoring.
  • Proficiency in container technologies: Docker and Kubernetes.
  • Familiarity with Linux system administration and hybrid cloud environments.
  • Experience with infrastructure-as-code tools: Terraform, Ansible.
  • Strong understanding of Active Directory, DNS, DHCP, and Windows security principles.
  • Ability to work on-site in one of the listed locations in a hybrid environment
  • Ability to work outside of normal business hours including nights and weekends on a limited/rotational basis
  • We are not considering candidates that require visa sponsorship

Nice To Haves

  • Security certifications (e.g., CISSP, Security+, GCP Professional Cloud Security Engineer).
  • Experience with CI/CD tools (e.g., GitLab CI and Jenkins).
  • Familiarity with ITIL practices and change management.
  • Exposure to ServiceNow, load balancers, certificate management, and endpoint protection tools.

Responsibilities

  • Ensure high availability, performance, and security of production systems across Windows, Linux, and GCP environments.
  • Engineer and support containerized workloads using Kubernetes (GKE) and Docker, enabling scalable microservices architectures.
  • Lead infrastructure provisioning and configuration using Terraform, Ansible, and GCP-native tools.
  • Develop automation scripts and pipelines to eliminate manual toil and accelerate incident response.
  • Implement observability frameworks using SLIs/SLOs, Prometheus, Grafana, and GCP Operations Suite.
  • Drive proactive monitoring, alerting, and telemetry across hybrid environments.
  • Lead incident response, root cause analysis, and postmortems.
  • Build self-healing systems and automated remediation workflows using GCP-native services and scripting.
  • Collaborate with InfoSec to enforce hardening standards, manage vulnerabilities, and support compliance initiatives.
  • Integrate security into CI/CD pipelines and container platforms using IAM, encryption, and policy enforcement.
  • Partner with developers, application owners, and infrastructure teams to deliver reliable, cloud-native platforms.
  • Document configurations, runbooks, and operational procedures to enable cross-team reuse and transparency.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service