The Support Manager is responsible for ensuring the availability, stability, and performance of Loyalty platform. This role leads incident response, operational support, and continuous service improvement initiatives to ensure systems remain secure, highly available, and capable of delivering exceptional service working with the Engineering teams and business. The ideal candidate brings strong experience in observability platforms such as Dynatrace and Azure Application Insights, advanced KQL-based diagnostics, and the ability to troubleshoot complex integrations within distributed systems and cloud platforms. What You'll Do: Serve as the primary point of contact for all major incidents related to the team’s technologies and platforms. Lead and coordinate incident response activities, ensuring rapid resolution and effective communication during critical events. Manage offshore support team, service availability, and operational processes for the team while monitoring workloads to maintain appropriate resource levels. Ensure effective implementation and governance of the Incident and Problem Management process, including reporting and post-incident reviews. Identify, initiate, schedule, and conduct incident reviews and root cause analysis to drive long-term resolution and prevent recurrence. Monitor incidents to ensure SLA adherence and operational performance targets are consistently met. Ensure the proper closure and documentation of all resolved incidents after end-user confirmation. Maintain strong business-level understanding of critical applications supporting key operational areas. Utilize Dynatrace and Azure Application Insights to monitor application performance, detect anomalies, and proactively identify system degradation. Leverage Kusto Query Language (KQL) to analyze telemetry, logs, and metrics to diagnose production issues. Troubleshoot complex integrations across distributed systems, including APIs, event-driven architectures, cloud services, and third-party platforms. Proactively identify potential issues requiring remediation and develop action plans in collaboration with business and IT teams. Produce daily, weekly, and monthly operational reports to demonstrate SLA compliance and system performance. Establish strong relationships across RaceTrac to effectively coordinate resolution during critical incidents. Maintain awareness of all operational and event-related activities and provide regular updates to senior management. Drive continuous process improvement by reviewing incident management processes, operational workflows, tools, and technologies.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager