About The Position

Seeking skilled professionals to join our Azure VMware Solution (AVS) operations support team. In this role, you will be responsible for providing comprehensive 24/7 technical support for Microsoft's cloud-based VMware infrastructure. The ideal candidate will excel in a fast-paced environment, delivering critical support services across time zones to ensure maximum platform availability and performance. This position requires strong diagnostic skills, effective communication with customers during critical incidents, and the ability to drive swift resolution of service-impacting events.

Requirements

  • BS in Computer Science or other technical discipline is preferred.
  • 5+ years of experience diagnosing/debugging faults in complex online services
  • Hold active DoD Secret security clearance and CJIS adjudication to maintain USME access
  • Ability to identify and script automatable problems, perform work with efficiency in mind
  • Experience with PowerShell, SQL, and Python scripting
  • Experience with VMware vSphere infrastructure management, including vCenter Server, ESXi host troubleshooting, and NSX-T networking components within Azure VMware Solution
  • Proficiency in diagnosing and resolving Azure VMware Solution connectivity issues, including ExpressRoute circuits, HCX migration tools, and vSAN storage performance optimization
  • Able to diagnose and mitigate faults
  • Able to identify and drive recovery levers with feature teams
  • Able to communicate effectively through written and oral English
  • Able to interact with external customers and partners on behalf of Microsoft
  • Ability to perform work under continuous deadline pressure
  • Ability to execute work with precision in time sensitive outage scenarios
  • Effectively communicate status changes to impact

Responsibilities

  • Troubleshoot complex issues related to VMware vSphere infrastructure, NSX-T networking, vSAN storage, and ExpressRoute connectivity within the AVS environment.
  • Participate in on-call rotations, collaborate with cross-functional engineering teams, and continuously improve operational efficiency through automation and process refinement.
  • Responds to incident tickets in an operational environment to meet SLA objectives. Typically responds to the more complex incidents.
  • Troubleshoots system issues using diagnostic tools like netmom, windbg, and custom application tools.
  • Reviews system logs to identify and mitigate system issues.
  • Leverages knowledge base to help troubleshoot, identify and resolve systems issues.
  • Updates knowledge base troubleshooting guides and lessons learned as required.
  • Documents incident fixes and make recommendations to engineering team for system improvements for consideration in future releases.
  • Documents system issues resulting in system outages and coordinate change though change management process.
  • Supports collaboration across operations, development teams and external partners.
  • Supports "tiger team" calls to streamline knowledge sharing and timely resolution of system issues.
  • Monitors solution performance according to client specification and SLAs. Serves as an escalation point on more complex issues.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service