AI Solutions Specialist

Jobgether
4dHybrid

About The Position

In this role, you will lead the design, optimization, and deployment of AI, GPU virtualization, and high-performance computing (HPC) solutions, supporting cutting-edge workloads across hybrid cloud and on-premise environments. You will work closely with engineering, data science, and operations teams to enhance AI inference, optimize GPU clusters, and ensure efficient, scalable infrastructure for complex workloads. Your expertise will directly impact performance, reliability, and innovation for AI and HPC systems used in demanding industries such as research, healthcare, finance, and autonomous technologies. This is a hands-on technical role that combines system optimization, model acceleration, and strategic implementation of advanced AI technologies. The position offers collaboration with top-tier professionals and exposure to the latest innovations in AI infrastructure and high-performance storage.

Requirements

  • 5+ years of experience in AI solution deployment, GPU virtualization, and HPC infrastructure optimization
  • Proficiency in AI frameworks (TensorFlow, PyTorch) and GPU acceleration (CUDA, NVIDIA vGPU, GPUDirect)
  • Experience managing and scaling large HPC clusters and GPU-driven workloads
  • Expertise in cloud infrastructure management (AWS, Azure, Google Cloud) and hybrid environments
  • Knowledge of distributed systems, containerization (Docker, Kubernetes), and orchestration
  • Strong programming and scripting skills (Python, C++, CUDA) for performance tuning and automation
  • Familiarity with high-performance interconnects (RDMA, InfiniBand) and storage optimization
  • BS or MS degree in Computer Science, Engineering, or a related technical field
  • Excellent problem-solving, communication, and collaboration skills; ability to work in a fast-paced, innovative environment

Responsibilities

  • Design and implement advanced GPU virtualization solutions, including GPU Direct Storage, to optimize AI inference and HPC workloads
  • Manage and optimize large-scale GPU and HPC clusters for performance, availability, and scalability
  • Collaborate with data science and engineering teams to optimize AI models using frameworks such as TensorFlow and PyTorch, leveraging CUDA for acceleration
  • Architect and deploy hybrid cloud solutions integrating on-premise and cloud-based infrastructure for high-performance workloads
  • Monitor, assess, and improve system performance using RDMA, InfiniBand, and high-bandwidth interconnects
  • Develop and maintain technical documentation, best practices, and operational procedures
  • Build and maintain strategic relationships with technology partners, hardware vendors, and customers to integrate industry-leading solutions

Benefits

  • Competitive salary based on experience and location
  • Flexible remote or hybrid work arrangements
  • Comprehensive medical, dental, and vision insurance
  • Paid time off and wellness programs
  • Professional development, training, and conference opportunities
  • Access to cutting-edge AI, GPU, and HPC technologies
  • Opportunity to work in a highly innovative, collaborative, and growth-oriented environment
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service