Major Incident & Problem Manager - Remote

South State BankWinter Haven, FL
1d$66,440 - $106,131Remote

About The Position

The SouthState story is one of steady growth, deep community roots, and an unwavering commitment to helping our customers move forward. Since our beginnings in the 1930s to becoming a trusted financial partner across the South and beyond - we are known for combining personal relationships with forward-thinking solutions. We are committed to helping our team members find their success while maintaining the integrity of our values: building trust, fostering lasting relationships and pursuing excellence. At SouthState, individual contributions are recognized, potential is cultivated and team members are inspired to achieve their greater purpose. Your future begins here! SUMMARY/OBJECTIVES We are seeking an experienced Major Incident & Problem Manager to oversee the rapid resolution of high-impact incidents within a large-scale financial services environment. This role is critical to ensure operational stability and minimizing downtime. The Major Incident Manager/Problem Manager is responsible for the end-to-end management of all major IT incidents. Their role and responsibilities are extremely varied and include (amongst others): Lead the end-to-end management of major incidents affecting critical banking and financial systems. Leveraging PagerDuty to issue all communications and providing key stakeholder management notification and updates. Serve as the primary point of escalation during high-severity incidents, ensuring clear communication with executives and impacted business units. Facilitating, and chairing all investigation activities, meetings, and conference calls. Forming action plans with specific actions, roles, and deadlines, and ensuring these are completed. Manage processes and resources including third parties to include resolving conflict to move forward to resolution. Being accountable for resolving the outage via workaround or permanent fix Ensuring all administration and reports are maintained and up to date, including weekly post major incident reviews and host problem management reviews. Supporting and nurturing process improvements and knowledge base improvements Continually maintaining and developing the PagerDuty tool and resources to manage major incidents effectively. Providing periodic major incident and Problem Management metrics reports.

Requirements

  • Proven experience in major incident management within large-scale, regulated financial institutions.
  • Proven incident management experience.
  • Leadership experience
  • Outstanding communications skills, both written and verbal strong team-oriented attitude
  • Broad IT knowledge and experience
  • Excellent troubleshooting skills with experience
  • Strong Understanding of Information Technology Infrastructure Library (ITIL)
  • Understanding of application architecture at a high level and its related components like network, storage, DB, OS.
  • Familiarity with banking systems and payment platforms.
  • Understanding of cloud environments to include SaaS, IaaS, PaaS, etc
  • Education. Bachelor’s Degree in IT related field or related
  • 5+ years of operational or incident management experience
  • 3+ years of experience using enterprise communication tools such as PagerDuty, Teams and Outlook
  • Ability to effectively communicate both and up and down throughout the organization and ability to swivel between business and IT employees.
  • Ability to effectively multitask and handle competing priorities.
  • Experience with incident management tools (e.g., ServiceNow, PagerDuty).
  • Proficiency in Microsoft Office Suite (Excel, PowerPoint, Teams)
  • Experience working remote while staying engaged with various teams.

Nice To Haves

  • Experience in financial services or banking domains

Responsibilities

  • Lead the end-to-end management of major incidents affecting critical banking and financial systems.
  • Leveraging PagerDuty to issue all communications and providing key stakeholder management notification and updates.
  • Serve as the primary point of escalation during high-severity incidents, ensuring clear communication with executives and impacted business units.
  • Facilitating, and chairing all investigation activities, meetings, and conference calls.
  • Forming action plans with specific actions, roles, and deadlines, and ensuring these are completed.
  • Manage processes and resources including third parties to include resolving conflict to move forward to resolution.
  • Being accountable for resolving the outage via workaround or permanent fix
  • Ensuring all administration and reports are maintained and up to date, including weekly post major incident reviews and host problem management reviews.
  • Supporting and nurturing process improvements and knowledge base improvements
  • Continually maintaining and developing the PagerDuty tool and resources to manage major incidents effectively.
  • Providing periodic major incident and Problem Management metrics reports.
  • Associate Incidents with other records (i.e., Incidents, Changes, Problems, Knowledge Articles, Known Errors, etc.)
  • Help drive resolution and ownership for client impacting incidents that have been promoted to a major incident.
  • Act as first escalation point for Bank Operations in the event of suspected or confirmed service interruption or degradation.
  • Record and classify received Incidents and undertake an immediate effort to restore a failed IT Service as quickly as possible.
  • Conducts escalation to service teams, senior management, and leaders to ensure appropriate awareness, engagement, and focus.
  • Coordinating large cross functional teams
  • Reporting, Metrics, and analysis trends Vendor/Inhouse/Process Failures
  • Produce accurate and timely communications tailored to relevant audience (Senior Leaders and internal Stakeholders).
  • Work closely with SMEs to quickly identify customer impact (who, how, when).
  • Coordinates and drives Restoration of Service for Major Incident events.
  • Vendor management/relations/escalations
  • Identifying root cause and mitigation
  • You'll be the quarterback of conference calls, commanding and controlling aspects of the meetings to maintain details of problems, identify needs to mitigate impact, take steps to restore service, and provide ETAs of resolutions.
  • You'll drive calls of upwards of 50 associates and engage teams needing RCA for issues.
  • Identify trends related to Incidents, Change and Problem and recommend viable solutions to prevent future occurrences, partnering with multiple divisions throughout the firm to drive a reduction in client impacting incidents.
  • Open problem records and assign tasks for all incidents you have ownership of
  • Ability to provide a graceful hand off to the IT Services Director if an incident escalates to P1 while continuing to stay engaged throughout the entire life cycle.
  • Help with the review of and the acceptance or rejection of major incident proposals.
  • Support and drive automation, elimination, and simplification of our current processes
  • Propose and undertake routine preventative actions to avoid service interruption or degradation.
  • Ensure the closure of all incident records.
  • Host and participate in incident reviews (Problem Management) to track updates, correction action plan, root cause, preventative action plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service