Incident Recovery Manager

PerficientCharlotte, NC
$81,978 - $149,880Onsite

About The Position

The Incident Recovery Manager will lead the recovery of critical incidents following major disruptions and drive strategies to ensure effective, timely restoration of services. This role is responsible for managing major incident recovery, leading post-mortem reviews, developing SOPs and runbooks, and implementing proactive measures to prevent future failures through the adoption of SRE principles. As a dynamic and motivated leader, you will play a key role in shaping and enhancing production support and operational capabilities. You will help identify and implement best-fit solutions that improve stability, reliability, and efficiency across the organization. This is an opportunity to drive meaningful change and enhance the end user experience. If you are passionate about production operations, stability, SRE, and observability, and have a proven track record of success, we invite you to join us in advancing resilience and operational excellence. Perficient is always looking for the best and brightest talent and we need you! We’re a quickly-growing, global digital consulting leader, and we’re transforming the world’s largest enterprises and biggest brands. You’ll work with the latest technologies, expand your skills, and become a part of our global community of talented, diverse, and knowledgeable colleagues. Perficient is the global AI and technology consulting firm disrupting the traditional consulting model. Powered by our 7,000+ advisors, engineers, and designers, Perficient implements AI-first solutions that break conventions and deliver outcomes that matter. Proudly serving clients that represent the world’s most innovative brands, and in collaboration with our powerful technology partner ecosystem, we bring deep industry expertise and data-driven design to redefine how businesses run and succeed. Perficient is different. For real. Learn more at perficient.com.

Requirements

  • Proven track record of success in production operations, stability, SRE, and observability.

Responsibilities

  • Lead the recovery of critical incidents following major disruptions.
  • Drive strategies to ensure effective, timely restoration of services.
  • Manage major incident recovery.
  • Lead post-mortem reviews.
  • Develop SOPs and runbooks.
  • Implement proactive measures to prevent future failures through the adoption of SRE principles.
  • Shape and enhance production support and operational capabilities.
  • Identify and implement best-fit solutions that improve stability, reliability, and efficiency across the organization.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service