About The Position

As IT Support Manager, you will lead the front-line IT support and operations function responsible for maintaining the health, availability, and performance of data center IT infrastructure. You will manage teams handling incident response, hardware troubleshooting, change execution, and service delivery, with strong exposure to GPU-based servers, enterprise networking, and colocation environments. This is a hands-on leadership role focused on operational execution, team performance, and service quality.

Requirements

  • 3–5+ years of experience in IT support or data center operations, including people management.
  • Strong hands-on experience with server hardware, including exposure to GPU-based systems.
  • Solid understanding of data center operations, networking basics, and structured cabling.
  • Experience leading incident response and operational troubleshooting.
  • Working knowledge of ITIL / ITSM frameworks.
  • Comfortable working with Linux systems and basic command-line tools.
  • Strong organizational skills and ability to prioritize in high-pressure environments.
  • Clear, concise communication skills for technical and non-technical stakeholders.

Nice To Haves

  • Experience in Neocloud, hyperscale, or AI/HPC environments
  • Prior ownership of 24/7 support operations
  • ITIL certification
  • Familiarity with GPU health monitoring, firmware, or platform tooling
  • Experience working with colocation facilities

Responsibilities

  • Lead daily IT operations across data center environments, ensuring high availability and SLA adherence.
  • Own incident management, including triage, escalation, coordination, and communication.
  • Drive root cause analysis (RCA) and follow-through on corrective and preventive actions.
  • Ensure operational readiness for GPU-dense infrastructure, including power, cooling, and hardware health monitoring.
  • Manage, schedule, and develop IT support engineers operating in shift-based / 24×7 environments.
  • Define and track KPIs, SLAs, and service quality metrics.
  • Provide hands-on guidance during complex troubleshooting scenarios.
  • Maintain consistent operational standards through runbooks, SOPs, and playbooks.
  • Oversee diagnosis and resolution of issues related to servers, GPU systems, networking equipment, and cabling.
  • Manage hardware lifecycle activities, including installations, upgrades, swaps, and decommissioning.
  • Coordinate RMAs, spare parts, inventory accuracy, and asset tracking.
  • Execute approved changes and maintenance activities with minimal risk.
  • Identify recurring issues and drive process improvements to reduce incidents and MTTR.
  • Ensure adherence to ITIL / ITSM operational processes.
  • Act as the operational interface to vendors, OEMs, and colocation providers for day-to-day support issues.
  • Support audits, compliance checks, and operational controls related to asset handling and access.
  • Ensure secure handling, storage, and decommissioning of IT assets.

Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service