NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation. The Position The Incident & Problem Manager is accountable for establishing and operating the Incident Management and Problem Management practices within NMC², ensuring that service disruptions are resolved quickly, root causes are identified and eliminated, and lessons learned drive continuous improvement across the ITSM ecosystem. This combined role owns the full lifecycle of reactive and proactive service restoration; from initial detection and triage through resolution, root cause analysis, and known error documentation, ensuring minimal business impact and sustained service reliability. The ITSM team is responsible for ensuring the reliability and stability of services across NMC²’s infrastructure and operations. The Incident & Problem Manager owns the end-to-end lifecycle of service disruptions, ensuring rapid restoration, effective escalation, and long-term resolution of underlying issues. Working alongside Service Desk, Engineering, Data Center Operations, and vendors, you will lead major incident response, drive root cause analysis, and implement continuous improvement across the ITSM ecosystem. This role plays a critical part in maintaining service availability and improving operational maturity at scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level