Vultr is looking for an AI Cluster Architect who will be responsible for creating and refining large-scale GPU cluster architectures within strict power and infrastructure limits. This role focuses heavily on power-aware design: starting from a fixed power envelope, the architect determines the optimal number of GPUs while accounting for the full stack of services needing to be deployed—compute nodes, storage systems, networking fabric, cooling, and facility constraints. This role requires deep experience navigating heterogeneous environments, multiple generations of hardware, and end user requirements. The architect must understand how different GPU SKUs, NICs, switches, and fabrics interact at scale, including their individual and aggregate power and thermal characteristics. They will evaluate multi-plane, rail-optimized, and tiered fabric designs across technologies like InfiniBand, RoCE, and SpectrumX to ensure the networking architecture supports the intended GPU count without overrunning facility limits or switch radix and/or topology constraints. This role balances customer-specific requirements for compute, storage, and service density, ensuring that the final cluster design maintains acceptable levels of GPU and fabric performance, while maximizing the number of usable GPUs within the total power budget.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
101-250 employees