Application Support Engineer

Confiz

64d

About The Position

We at Confiz are hiring an Application Support Engineer with hands-on experience in monitoring tools, backend API troubleshooting, and incident management. Join our team to ensure seamless application performance and drive operational excellence.

Requirements

Bachelor’s or master's degree in computer science, Engineering, Information Technology, or a related field
3+ years of experience in Application Support, Production Support, or Site Reliability Engineer roles
Must have Hands-on experience of Azure cloud platform
Strong knowledge of backend APIs (REST/GraphQL) and ability to read & troubleshoot logs
Hands-on experience with Azure App Insights and at least one observability platform such as New Relic, Dynatrace, Grafana, or Splunk
Proficiency in SQL with ability to write complex queries for data validation and impact analysis
Experience in dashboard creation, alert configuration, and monitoring solutions
Strong problem-solving skills with the ability to identify patterns across incidents
Excellent written skills for ticket documentation and RCA preparation
Strong collaboration skills across engineering, QA, and operations teams
Proactive attitude with a mindset of continuous improvement and ownership
Experience in system performance monitoring and analyzing health metrics
Understanding of service SLAs, error budgets, and uptime reporting
Experience with incident management and on-call support processes
Familiarity with release management and post-release validation

Responsibilities

Troubleshoot issues by analyzing backend API logs and Azure App Insights
Build and maintain dashboards and alerts in Azure App Insights to monitor APIs, endpoints, and system/app health (spikes, deviations, anomalies)
Create monitoring solutions in platforms like New Relic, Dynatrace, Splunk, or Grafana
Ensure all support tickets are fully documented with detailed log analysis, journey insights, and resolution updates
Perform post-release log analysis to identify early issues or abnormal patterns
Ensure service SLAs are consistently met and take proactive measures to prevent breaches
Detect and investigate recurring issues/patterns across multiple cases and escalate to leads/ops teams
Write clear Root Cause Analysis (RCA) reports for all major incidents P1/P2 , including impacted customer identification via logs
Use SQL effectively for data validation, impact analysis, and troubleshooting
Maintain a proactive approach , continuously improving monitoring and incident management processes