About The Position

We are seeking a Senior Operations & Maintenance Support Specialist to provide advanced technical support with specialized focus on Kubernetes and Virtual Desktop Infrastructure (VDI) in a government environment. This role combines Tier 1 and Tier 2 Site Reliability Engineering (SRE) support responsibilities with deep technical expertise in Azure Kubernetes Services and VDI technologies. The ideal candidate will serve as a technical escalation point, mentor junior team members, and provide expert-level troubleshooting for complex cloud infrastructure and application issues while ensuring system reliability and optimal user experiences. Must be a US citizen and an active TS/SCI security clearance is required for this position. Sign-on Bonus available

Requirements

  • Must be US Citizen
  • Education: Bachelor's degree in Computer Science, Information Technology, or related technical field
  • Experience: Minimum 6 years of professional experience in technical support, operations, SRE, or infrastructure engineering roles
  • Security Clearance: Active TS/SCI clearance (Required)
  • Kubernetes Expertise: Advanced hands-on experience with Kubernetes or Azure Kubernetes Services (AKS) including troubleshooting pods, services, deployments, networking, and cluster operations
  • VDI Expertise: Deep experience with Virtual Desktop Infrastructure technologies, particularly Azure Virtual Desktop (AVD), including architecture, troubleshooting, and optimization
  • Configuration Management: Advanced proficiency with Salt, Ansible, or similar configuration management tools with experience developing complex automation
  • Containerization: Strong experience with Docker, container troubleshooting, and containerized application support
  • SRE Practices: Understanding of Site Reliability Engineering principles including monitoring, incident response, and system reliability
  • Cloud Platforms: Advanced knowledge of Azure cloud services with focus on compute, networking, and storage
  • Operating Systems: Expert-level troubleshooting in Windows and Linux environments
  • Monitoring & Observability: Experience with Azure Monitor, Prometheus, Grafana, or similar monitoring tools for Kubernetes and infrastructure
  • Scripting & Automation: Proficiency in PowerShell, Python, or Bash for automation and troubleshooting
  • Incident Management: Advanced experience with Jira or similar systems, including leading complex incident resolution
  • Technical Leadership: Proven ability to mentor team members and provide technical guidance

Responsibilities

  • Serve as first point of contact for end-user technical support requests related to cloud services, hosted applications, and VDI functionality, providing responsive customer service and executing deep troubleshooting
  • Create, update, and maintain technical support tickets in Jira with accurate documentation, proper categorization, and tracking of resolution times to maintain SLA compliance
  • Escalate complex issues to Tier 2/3 support teams, SREs, and Platform Engineers with comprehensive documentation, coordinating with Development, DevOps, and Architecture teams for resolution
  • Manage VDI technologies including troubleshooting and repairing User Virtual Machines (UVMs), building and deploying base images for supported operating systems, and patching running UVMs
  • Build, package, deploy, and manage Salt States to install applications in virtual machines, operating the Salt Master for application deployment across the VDI environment
  • Monitor system performance, identify patterns indicating systemic issues, and support maintenance windows with user communication during planned outages
  • Maintain and update knowledge base articles, contribute to support procedures and troubleshooting guides, and participate in continuous service improvement initiatives
  • Work with cloud providers for Tier 1 and Tier 2 support issues, escalating Tier 3+ issues appropriately
  • Other duties as assigned
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service