Strategic Technical Account Manager GPU

Vultr
$115,000 - $140,000

About The Position

Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. With 32 global cloud data center locations, Vultr is trusted by hundreds of thousands of active customers across 185 countries for its flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. In December 2024 Vultr announced an equity financing at a $3.5 billion valuation. Founded by David Aninowsky and self-funded for over a decade, Vultr has grown to become the world’s largest privately-held cloud infrastructure company. The GPU-focused Technical Account Manager (TAM) leads the post-sales technical success of customers deploying large-scale AI, training, inference, and high-performance GPU workloads on the company’s platform. This includes customers using NVIDIA GPU clusters, AMD GPU clusters, GPU VMs, and rack-scale bare-metal environments. You will act as a trusted advisor across LLM training, fine-tuning, RAG workloads, distributed training frameworks, storage throughput requirements, multi-GPU scaling, and performance tuning. This role requires deep technical fluency and exceptional customer management skills to help AI/ML teams achieve predictable, cost-efficient, high-performance outcomes.

Requirements

  • 2–5+ years as an AI/ML Engineer, AI/ML Ops, Technical Account Manager, HPC Engineer, Sales/Solutions Engineer or relevant technical role.
  • Strong knowledge of GPU hardware architectures (NVIDIA/AMD), CUDA/ROCm, distributed training, and ML frameworks.
  • Experience with Linux tuning, networking (Infiniband, RoCE fabrics).
  • Experience with high-performance storage systems (DDN, NetApp, Vast, Weka, etc.).
  • Ability to communicate complex concepts clearly to both executives and engineering teams.

Nice To Haves

  • Prior experience supporting hyperscale, AI labs, or large cluster deployments is a plus.
  • Cloud Native Computing Foundation Certified Kubernetes Administrator (CKA) certification is a plus.

Responsibilities

  • Lead onboarding for customers deploying GPU clusters (bare metal, VMs, or hybrid).
  • Advise on cluster design: multi-GPU topology, NVLink/NVSwitch considerations, RDMA, Infiniband and RoCE Ethernet, networking throughput, and storage IOPS requirements.
  • Guide customers in selecting GPU types and configurations based on workload (training, fine-tuning, inference, embeddings, RAG pipelines).
  • Support distributed frameworks: PyTorch, TensorFlow, DeepSpeed, Megatron, JAX, Ray, Mosaic, HuggingFace, etc.
  • Advanced hands on Kubernetes skills
  • Advanced hands on SLURM skills
  • Identify bottlenecks (network, storage, memory bandwidth).
  • Provide tuning recommendations for batch size, mixed precision, parallelization strategies, and checkpointing.
  • Help customers evaluate cost vs. performance tradeoffs (GPU mix, CPU pairing, instance types, cluster sizing).
  • Own the long-term technical strategy across assigned GPU/AI accounts, including hyperscalers, labs, and high-growth AI startups.
  • Host recurring technical review meetings, roadmap reviews, and optimization sessions.
  • Define scaling plans, future GPU reservation needs, and capacity forecasting.
  • Partner with Support, SRE, Networking, NOC, and Product Management & Engineering to resolve high-urgency incidents.
  • Manage outage communications, corrective action plans, and postmortem reviews with customers.
  • Advocate for GPU reliability improvements and influence roadmap priorities.
  • Identify opportunities for expanded clusters, high speed storage, or networking upgrades.
  • Support Sales with technical validation and architecture diagrams needed for expansion.
  • Provide structured feedback on existing and future GPU offerings, networking fabrics, storage platforms, and upcoming AI/ML platform features.
  • Partner with Product on early access programs (new GPUs, pipelines, orchestration, etc.).

Benefits

  • 100% company-paid insurance premiums for employee medical, dental and vision plans.
  • 401(k) plan that matches 100% up to 4%, with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan
  • Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
  • $500 stipend for remote office setup in first year + $400 each following year
  • Internet reimbursement up to $75 per month
  • Gym membership reimbursement up to $50 per month
  • Company paid Wellable subscription
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service