In the Role Ensure all closed incidents within the assigned scope have a corresponding problem ticket. Schedule and facilitate formal Root Cause Analysis (RCA) sessions following the resolution of major incidents (P1, P2, P3). Invite all relevant stakeholders involved in incident response and resolution (e.g., L2/L3 teams, Monitoring Analysts, Incident Commanders, Scribes, Infrastructure, Network, Help Desk). Conduct RCAs within established SLA timelines. Perform detailed problem analysis for all problem tickets within the assigned domain. Apply critical thinking techniques such as the “5 Whys” to identify root causes. Prioritize tickets based on incident severity (P1, P2, P3, P4). Assign action items and partner with cross-functional teams to ensure closure, including: Resolving underlying technical root causes Addressing process gaps and training needs Enhancing logging, monitoring, and alerting to improve early detection and resolution Identifying and remediating similar or related issues Capturing long-term improvement opportunities Updating SOPs and knowledge management documentation Define and implement temporary workarounds until permanent resolutions are in place. Manage reporting on open, aging, and recurring problems, as well as recently closed issues. Provide categorization, trend analysis, and domain-level insights. Facilitate review meetings to drive accountability and progress toward closure. Build and maintain knowledge management resources, including creating and curating articles to support self-service and operational efficiency. Proactively identify problems through trend analysis of incidents and monitoring data. Open and manage problem tickets as needed, ensuring follow-through to resolution. Provide leadership and guidance to cross-functional teams in root cause identification, corrective action planning, and execution. Mentor and coach members of the Problem Management function. Recommend improvements to mature Problem Management practices and, upon leadership alignment, help drive implementation of those enhancements. Support high-risk releases by assisting development teams in creating effective test cases to ensure business requirements are met. Participate in User Acceptance Testing (UAT) and regression testing to prevent incidents or identify issues prior to release. Support and/or execute code releases as needed.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Associate degree