Site Reliability Engineer (SRE) – II

Huntington National Bank
76d

About The Position

Are you a natural problem solver who thrives in high-pressure situations and enjoys working across teams to keep systems running smoothly? We're looking for a Site Reliability Engineer (SRE) Level II who brings not only technical expertise but also strong communication and collaboration skills to help support and scale our critical systems. This role is ideal for someone who is personable and proactive, with a knack for jumping into complex issues and guiding troubleshooting conversations. You’ll be part of a team that ensures our systems are resilient, scalable, and well-supported. While technical depth is important, we’re especially looking for someone who can lead incident response, communicate clearly, and drive continuous improvement in both systems and processes.

Requirements

  • Bachelor’s degree in computer science or Information Technology.
  • 3+ years of experience in site reliability engineering, DevOps, systems administration, or related roles.

Nice To Haves

  • Strong troubleshooting and communication skills in production environments.
  • Experience supporting applications in both .NET and Spring Boot frameworks.
  • Familiarity with OpenShift, Windows Server, and hybrid deployment environments.
  • Proficiency in log analysis using SQL Queries and Splunk.
  • Strong scripting skills (e.g., PowerShell, Bash, Python).
  • Familiarity with cloud platforms (AWS, GCP etc).
  • Hands-on experience with observability tools (Dynatrace, Datadog etc.).
  • Strong interpersonal skills and a customer-focused mindset.

Responsibilities

  • Lead real-time troubleshooting efforts for high-impact production issues.
  • Collaborate across IT and engineering teams to resolve incidents quickly and effectively.
  • Provide mentorship and guidance to junior SREs and support staff.
  • Participate in on-call rotations and act as an escalation point.
  • Build and maintain automation using tools like Terraform, Ansible, or CloudFormation.
  • Eliminate manual tasks and improve reliability through scripting and automation.
  • Build and optimize monitoring dashboards using tools like Prometheus, Dynatrace, Splunk etc.
  • Ensure visibility into system health and proactively detect issues.
  • Drive improvements in deployment, monitoring, and incident response processes.
  • Champion best practices across the SRE and support teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service