Site Reliability Engineer (SRE) – II

Huntington National BankEaston, OH
Hybrid

About The Position

This employer is seeking a Site Reliability Engineer (SRE) Level II who is a natural problem solver, thrives in high-pressure situations, and enjoys cross-team collaboration to maintain system stability. The ideal candidate possesses strong technical expertise, excellent communication, and collaboration skills to support and scale critical systems. This role is suited for someone personable and proactive, capable of quickly grasping complex issues, guiding troubleshooting, and adapting to new technologies. The SRE will work with developers, support teams, and business stakeholders to resolve problems and enhance reliability. The position is part of a team dedicated to ensuring systems are resilient, scalable, and well-supported, with a particular emphasis on leading incident response, clear communication, and driving continuous improvement in both systems and processes.

Requirements

  • Bachelor’s degree in computer science, Information Technology
  • 3+ years of experience in site reliability engineering, DevOps, systems administration, or related roles.

Nice To Haves

  • Strong troubleshooting and communication skills in production environments.
  • Experience supporting applications in both .NET and Spring Boot frameworks.
  • Familiarity with OpenShift, Windows Server, and hybrid deployment environments.
  • Proficiency in log analysis using SQL Queries and Splunk.
  • Strong scripting skills (e.g., PowerShell, Bash, Python).
  • Familiarity with cloud platforms (AWS, GCP etc)
  • Hands-on experience with observability tools (Dynatrace, Datadog etc.).
  • Strong interpersonal skills and a customer-focused mindset.

Responsibilities

  • Lead real-time troubleshooting efforts for high-impact production issues.
  • Collaborate across IT and engineering teams to resolve incidents quickly and effectively.
  • Provide mentorship and guidance to junior SREs and support staff.
  • Participate in on-call rotations and act as an escalation point.
  • Build and maintain automation using tools like Terraform, Ansible, or CloudFormation.
  • Eliminate manual tasks and improve reliability through scripting and automation.
  • Build and optimize monitoring dashboards using tools like Prometheus, Dynatrace, Splunk etc.
  • Ensure visibility into system health and proactively detect issues.
  • Drive improvements in deployment, monitoring, and incident response processes.
  • Champion best practices across the SRE and support teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service