ITIL Incident Management/IT Ops Analyst

OSI DigitalIrvine, CA
Onsite

About The Position

Problem Management Execution Own the end-to-end Problem Management lifecycle in alignment with ITIL practices Perform Root Cause Analysis (RCA) using structured methodologies (5 Whys, Fishbone, etc.) Identify and track Known Errors and maintain the Known Error Database (KEDB) Drive permanent fixes in collaboration with L2/L3 infrastructure and application teams Incident & Major Incident Support Support Major Incident Management (MIM) process during high-severity incidents Participate in incident bridge calls, ensuring proper coordination and communication Conduct Post-Incident Reviews (PIRs) and ensure actionable follow-ups ServiceNow & Process Governance Utilize ServiceNow for Problem, Incident, and Change tracking Ensure data quality, categorization accuracy, and SLA adherence Create dashboards and reports for trend analysis and recurring issue identification Continuous Improvement & Operations Alignment Identify recurring incidents and systemic issues impacting service stability Partner with Infrastructure, Network, and Application teams to drive preventive actions Recommend automation opportunities and operational improvements Contribute to runbook and knowledge base enhancements

Requirements

  • 3–5 years of hands-on experience in Problem Management or Incident Management within IT Operations
  • ITIL certified
  • Strong working experience with ServiceNow (Problem Management or Major Incident modules)
  • Solid understanding of IT Infrastructure domains (Servers, Network, End-User Compute, Applications)
  • Experience conducting Root Cause Analysis and Post-Incident Reviews
  • Willingness to participate in 24×7 on-call rotation
  • Flexibility to support early morning / late evening operational needs

Responsibilities

  • Own the end-to-end Problem Management lifecycle in alignment with ITIL practices
  • Perform Root Cause Analysis (RCA) using structured methodologies (5 Whys, Fishbone, etc.)
  • Identify and track Known Errors and maintain the Known Error Database (KEDB)
  • Drive permanent fixes in collaboration with L2/L3 infrastructure and application teams
  • Support Major Incident Management (MIM) process during high-severity incidents
  • Participate in incident bridge calls, ensuring proper coordination and communication
  • Conduct Post-Incident Reviews (PIRs) and ensure actionable follow-ups
  • Utilize ServiceNow for Problem, Incident, and Change tracking
  • Ensure data quality, categorization accuracy, and SLA adherence
  • Create dashboards and reports for trend analysis and recurring issue identification
  • Identify recurring incidents and systemic issues impacting service stability
  • Partner with Infrastructure, Network, and Application teams to drive preventive actions
  • Recommend automation opportunities and operational improvements
  • Contribute to runbook and knowledge base enhancements

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service