SRE Engineer - Mainframes

Huntington National Bank•Easton, OH

1d•Hybrid

About The Position

The Site Reliability Engineer is responsible for technical support of various complex applications Provide guidance on troubleshooting incidents and implementation of any corrective actions Possesses in-depth knowledge and works with technical tools available for monitoring and supporting the systems. Ability to perform technical analysis. The individual in this role will frequently works with external vendor representatives that interface with our internal applications Develop short- and long-term solutions to issues that come across on the day-to-day basis, including the defects, recurring issues. Build monitoring/alerting/availability/uptime into applications and reduce toil Understand the industry standard and practices and the knowledge of technology industry trends as they apply to Huntington Drive process enhancements, streamline communication, and optimize delivery through innovative practices, from discovery to deployment. Primarily focusing on the Automations, Dashboards, Continuous Service Improvement, eliminate toil, Proactive initiatives, reduce recurring issues/changes etc. Participate with other developers and/or contractors in troubleshooting and identifying problems. Work with Development teams to document production issues that require code fixes and assist with validating and closing out any defects Manage customer impacting incidents including business impact assessment, technical resolution, engagement, and communications Ensuring monitoring alerts and systems events are assessed, prioritized, and assigned Update knowledgebase with support information (ex. Known errors and solutions for these errors). Provide clarifications on data issues identified in the application Works closely with Problem review team for ongoing tracking and mitigation of any known system problems Escalate incidents to appropriate interfacing support teams or external teams such as product vendors Experience developing repeatable processes and metrics that maximum uptime, reliability, and predictability

Requirements

2+ years experience working knowledge of Mainframes, SQL Server or equivalent
2+ years experience Batch Scheduling tools such as Zeke, Zena etc
2+ years experience knowledge on ITIL tools such as ServiceNow.
2+ years experience with Z/OS, MVS experienced with supporting tools, languages, etc. (TSO, ISPF, COBOL, Easytrieve, JCL, IBM utilities, SORT, FileAid, Xpediter, ChangeMan, AbendAid)
2+ years experience providing problem resolution support, specific to CICS and MQ on z issues; identifies and resolves systems application problems, coordinates with programming application users to determine symptoms and ensure accurate problem resolution.
2+ years experience Software Engineering and Site Reliability

Nice To Haves

Excellent oral and written communication skills
Exhibit best practices, follow standards, and present suggestions while remaining flexible and open
Hands-on knowledge on the API’s, Webservices, Microservices specifically from SRE/Support perspective.
Expertise utilizing Cloud Infrastructure such as Azure, AWS
Hands-on monitoring tools such as Dynatrace, Splunk, Zenoss
In-depth knowledge of different SDLC methodologies including Waterfall, Agile, etc.

Responsibilities

Provide technical support for complex applications.
Troubleshoot incidents and implement corrective actions.
Work with technical tools for monitoring and supporting systems.
Perform technical analysis.
Work with external vendor representatives.
Develop short- and long-term solutions to issues.
Build monitoring/alerting/availability/uptime into applications.
Understand industry standards and technology trends.
Enhance processes, streamline communication, and optimize delivery.
Focus on Automations, Dashboards, Continuous Service Improvement.
Participate in troubleshooting and identifying problems.
Work with Development teams to document production issues and validate defects.
Manage customer impacting incidents.
Assess, prioritize, and assign monitoring alerts and systems events.
Update knowledgebase with support information.
Provide clarifications on data issues.
Work with Problem review team for tracking and mitigation of system problems.
Escalate incidents to appropriate support teams or external teams.
Develop repeatable processes and metrics to maximize uptime, reliability, and predictability.