Azure Lead Operations Engineer

Ascend TechnologiesLittle Rock, AR
7dHybrid

About The Position

Azure Lead Operations Engineer This position will work primarily remote with an onsite responsibility multiple times a month in Conway, AR. All applicants must live within 45 minutes of Conway AR. PURPOSE The Lead Azure Operations Engineer serves as the senior technical execution leader within the Azure Operations function. This role is responsible for all duties of the Azure Operations Engineer while also providing operational leadership, escalation governance, technical quality oversight, and continuous improvement direction for the Operations Support and Development (OSD) team. The Lead Azure Operations Engineer acts as the primary technical advisor to the OSD Manager, ensuring consistent operational standards, effective shift execution, high-quality incident response, and progressive Azure capability development across the team. This position balances advanced hands-on technical work with operational coordination, mentorship, and process maturity leadership.

Requirements

  • 5+ years of experience in IT operations, cloud operations, or platform support roles.
  • 3+ years hands-on Azure production support experience.
  • Demonstrated experience handling complex escalations.
  • Strong understanding of incident, problem, and change management processes.
  • Experience mentoring or informally leading technical peers.
  • Strong communication and technical documentation skills

Nice To Haves

  • Microsoft Certified: Azure Administrator Associate (AZ-104); Azure Fundamentals (AZ-900) or Microsoft Security SC-900 preferred.
  • Experience with Azure monitoring and security tooling (Azure Monitor, Log Analytics, Defender for Cloud, Sentinel).
  • Experience with automation tooling (PowerShell, Azure CLI, Bicep, Terraform).
  • Experience supporting Agile delivery models.
  • ITIL certification or equivalent process framework knowledge.

Responsibilities

  • Serving as Tier 2 escalation point for Azure-related incidents
  • Troubleshooting and resolving complex Azure infrastructure, networking, and identity issues
  • Performing root cause analysis and driving corrective actions
  • Executing operational changes and release support activities
  • Monitoring and responding to security-related alerts
  • Applying Azure security best practices
  • Maintaining and improving runbooks and SOPs
  • Participating in Agile ceremonies and continuous improvement efforts
  • Supporting 24/7 operational coverage through rotating shifts
  • Maintaining Azure certification and professional development standards
  • Serve as the final internal escalation point prior to engineering or architecture engagement.
  • Oversee escalation flow from Tier 1 to Tier 2 to Engineering to ensure proper routing and resolution ownership.
  • Monitor incident queues and backlog health to ensure SLA adherence.
  • Lead response coordination during high-impact or major incidents.
  • Review post-incident analyses and ensure corrective actions are implemented.
  • Assist the OSD Manager with 24/7 coverage planning and shift alignment.
  • Support on-call rotation management and coverage balancing.
  • Ensure effective shift handoffs and operational continuity.
  • Identify staffing or coverage gaps and recommend adjustments.
  • Establish and maintain standards for operational documentation and runbooks.
  • Review and approve updates to runbooks and SOPs.
  • Ensure incident learnings translate into updated documentation.
  • Conduct periodic audits of operational procedures for accuracy and completeness.
  • Identify opportunities to automate repeatable operational tasks.
  • Assist the OSD Manager in tracking and analyzing:
  • Mean Time to Acknowledge (MTTA)
  • Mean Time to Resolution (MTTR)
  • Escalation frequency
  • Alert noise vs actionable events
  • Backlog aging and SLA adherence
  • Provide technical summaries of operational performance.
  • Recommend improvements to monitoring, alerting, and workflow processes.
  • Review Azure deployments for operational supportability.
  • Validate monitoring, tagging, RBAC, and security baselines.
  • Participate in change reviews to assess operational risk.
  • Ensure production support standards are consistently applied.
  • Identify systemic reliability risks and recommend mitigation strategies.
  • Identify repetitive manual tasks and drive automation initiatives.
  • Promote use of scripting and Infrastructure as Code practices.
  • Collaborate with development teams to improve deployment reliability.
  • Reduce operational noise through monitoring refinement.
  • Drive reduction of recurring incident patterns.

Benefits

  • Along with a competitive salary, we offer a comprehensive benefits package, including health, dental, and vision insurance, retirement savings options, flexible time off (FTO), and professional development opportunities.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service