ITIL Problem Management Analyst

OSI DigitalIrvine, CA
$25 - $50Onsite

About The Position

We are seeking an experienced Problem Management Lead Analyst to drive service stability and continuous improvement across IT Operations. This role will own the end-to-end Problem Management process, lead root cause investigations, identify systemic issues, and partner with technology teams to eliminate recurring incidents and improve overall service reliability.

Requirements

  • 5+ years of experience in IT Operations with at least 3 years focused on Problem Management, Service Reliability, or IT Service Management.
  • ITIL Foundation certification required; ITIL Managing Professional or Advanced certifications preferred.
  • Strong hands-on experience with ServiceNow Problem Management, Incident Management, and reporting modules.
  • Proven experience conducting complex Root Cause Analysis and facilitating cross-functional problem review sessions.
  • Strong understanding of enterprise IT infrastructure including Servers, Cloud, Network, End User Computing, and Applications.
  • Experience developing metrics, dashboards, and executive reporting.
  • Excellent facilitation, communication, and stakeholder management skills.
  • Ability to influence technical teams and drive resolution of long-standing operational issues.

Nice To Haves

  • Experience implementing Problem Management programs or maturing ITSM processes.
  • Familiarity with SRE, Reliability Engineering, or Operational Excellence frameworks.
  • Experience with Power BI, Tableau, or ServiceNow Performance Analytics.
  • Knowledge of automation platforms and operational process optimization.

Responsibilities

  • Own and govern the end-to-end Problem Management lifecycle in alignment with ITIL best practices.
  • Lead proactive and reactive problem investigations to identify underlying causes of recurring incidents and service disruptions.
  • Facilitate Root Cause Analysis (RCA) sessions using structured methodologies such as 5 Whys, Fishbone Analysis, Fault Tree Analysis, and Kepner-Tregoe.
  • Establish and maintain a Known Error Database (KEDB), ensuring accurate documentation of known errors and workarounds.
  • Track corrective and preventive actions through resolution and verify effectiveness of implemented fixes.
  • Drive accountability across Infrastructure, Network, Cloud, Security, End User Computing, and Application teams to resolve systemic issues.
  • Analyze incident, change, and operational data to identify trends, recurring issues, and opportunities for service improvement.
  • Develop and present actionable recommendations to improve platform stability, reduce incident volumes, and enhance service performance.
  • Lead recurring service review meetings focused on problem trends, chronic issues, and risk mitigation.
  • Identify automation opportunities and process improvements that reduce operational effort and prevent recurring incidents.
  • Contribute to operational excellence initiatives, knowledge management, and runbook enhancements.
  • Utilize ServiceNow Problem Management capabilities to manage problem records, known errors, corrective actions, and reporting.
  • Establish KPIs and metrics related to problem management effectiveness, including recurring incident reduction, RCA completion, and corrective action closure.
  • Create executive-level dashboards and reports highlighting service health trends, top recurring issues, and improvement initiatives.
  • Ensure compliance with ITIL processes, documentation standards, and audit requirements.
  • Partner with Major Incident Management teams to ensure high-priority incidents are transitioned into formal problem investigations when appropriate.
  • Lead Post-Incident Reviews (PIRs) focused on identifying root causes and preventive actions.
  • Collaborate with Change Management teams to ensure corrective actions are properly planned, tested, and implemented.
  • Assess risks associated with recurring issues and provide recommendations for long-term remediation.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service