We are seeking a Senior Site Reliability Engineer who will be the guardian of our Azure infrastructure reliability. This role focuses on building comprehensive observability platforms, implementing intelligent monitoring systems, and proactively identifying issues before they impact production. You will create the tools and automation that predict, detect, and prevent problems rather than simply reacting to them. Your primary mission is ensuring our Azure infrastructure and applications never surprise us with failures. The ideal candidate has deep expertise in Azure Monitor, Application Insights, Log Analytics, and KQL, combined with strong scripting skills in Python or PowerShell. You should have 5-7+ years of experience implementing observability platforms and a proven track record of preventing incidents through proactive monitoring and automation. You'll work with technologies like Prometheus, Grafana, OpenTelemetry, and Azure services (AKS, App Services, Azure SQL, Cosmos DB) while building self-healing automation and predictive analytics tools that keep our systems healthy.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed