Huntington National Bank-posted 2 days ago
Full-time • Mid Level
Hybrid • Columbus, OH

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The position reports to the Chief Development Office’s (CDO) Cloud Infrastructure Acceleration team. The SRE manager will lead a team of Onshore and Offshore SRE’s to develop Infrastructure as Code (IaC) and pipelines to provide platform, infrastructure, observability, and security capabilities. The qualified candidates will collaborate with the CDO, Application, Incident, Security, and Change Management teams to manage the ITIL process, reduce toil, enhance reliability, and drive innovation. Candidate will manage a team of developers whose goals are reliability, compliance, automation, enablement, release when ready and to build a culture of support, continuous improvement, and learning.

  • Manage GCP’s SRE team, discipline, maintain service levels, manage cost, and enhance operations.
  • Manage Stack Overflow channel, GCP releases and Disaster Recovery exercises.
  • Manage Platform RBAC, Firewall and User Access certifications.
  • Support GCPs’ 3rd party system integrations.
  • Develop SRE strategies, best practices, and knowledge base.
  • Develop monitoring and alerting capabilities to increase observability, availability and reduce toil.
  • Participate in the DevSecOps model to build, assess, and implement SRE cloud solutions via IaC.
  • Collaborate with Incident, Cybersecurity, Application and SRE teams to troubleshoot issues, restore functionality, perform root cause analysis, and deliver enhancements.
  • Provide 24x7 GCP support and coordinate on-call rotations.
  • Conduct periodic blameless incident retrospective and focus on continuous improvement.
  • Conduct training sessions and simulated game days.
  • Experience with scripting and programming languages and concepts
  • Demonstrate knowledge of GCP, CLI, services and integration.
  • Demonstrate knowledge of DevSecOps tool chains and processes.
  • Demonstrate knowledge of IaC software: Terraform, CLI, CDM, CFT, and ARM.
  • Demonstrate knowledge of Security as Code principles, policy, best practices, and tools.
  • Demonstrate knowledge of Credential, Certificate and Encryption best practices, rotation, and policies.
  • Experience using monitoring tools like Cloud Logging, Splunk, and Dynatrace to evaluate system health, develop dashboards, research issues, identify root causes and provide solution options.
  • Duties as assigned
  • Minimum of 5 years of SRE experience with GCP, AWS, and/or Azure
  • Minimum of 5 years of experience developing automated solutions using IaC - Terraform or OpenTofu. Additional experience is a plus: Python, PowerShell, Ansible, Chef, Ruby, and JSON.
  • Minium of 3 years managing onshore or offshore teams.
  • Bachelor's degree or equivalent work experience
  • Experience troubleshooting cloud-based technologies.
  • Cloud (GCP, AWS, Azure) and/or IaC certifications and/or work experience
  • Experience in Agile delivery, Azure DevOps Services, CI/CD Pipelines, Monitoring and Security tools.
  • Security tool integration experience: Prisma, Snyk, or GitLeak’s.
  • Experience with cloud security, IAM, Security Scans and custom policies.
  • Full stack engineering knowledge – application, network, infrastructure, and security
  • Understanding of containers and serverless computing concepts
  • Background in application, database, and infrastructure monitoring tools
  • Willingness to guild others and outstanding communication skills
  • Familiarity with financial industry
  • health insurance coverage
  • wellness program
  • life and disability insurance
  • retirement savings plan
  • paid leave programs
  • paid holidays
  • paid time off (PTO)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service