AI Model Serving Specialist

Rackspace
54d$82,300 - $140,580

About The Position

Enable enterprise customers to operationalize AI workloads by deploying and optimizing model-serving platforms (e.g., NVIDIA Triton, vLLM, KServe) within Rackspace’s Private Cloud and Hybrid environments. This role bridges AI engineering and platform operations, ensuring secure, scalable, and cost-efficient inference services.

Requirements

  • Hands-on experience with NVIDIA Triton, vLLM, or similar serving stacks.
  • Strong knowledge of Kubernetes, GPU scheduling, and CUDA/MIG.
  • Familiarity with VMware VCF9, NSX-T networking, and vSAN storage classes.
  • Proficiency in Python and containerization (Docker).
  • Understanding of observability stacks (Prometheus, Grafana) and FinOps principles.
  • Exposure to RAG architectures, vector DBs, and secure multi-tenant environments.
  • Excellent problem-solving and customer-facing communication skills.

Nice To Haves

  • NVIDIA Certified Professional (AI/ML)
  • Kubernetes Administrator (CKA)
  • VMware VCF Specialist
  • Rackspace AI Foundations (internal)

Responsibilities

  • Model Deployment & Optimization
  • Package and deploy ML/LLM models on Triton, vLLM, or KServe within Kubernetes clusters.
  • Tune performance (batching, KV-cache, TensorRT optimizations) for latency and throughput SLAs.
  • Platform Integration
  • Work with VMware VCF9, NSX-T, and vSAN ESA to ensure GPU resource allocation and multi-tenancy.
  • Implement RBAC, encryption, and compliance controls for sovereign/private cloud customers.
  • API & Service Enablement
  • Integrate models with Rackspace’s Unified Inference API and API Gateway for multi-tenant routing.
  • Support RAG and agentic workflows by connecting to vector databases and context stores.
  • Observability & FinOps
  • Configure telemetry for GPU utilization, request tracing, and error monitoring.
  • Collaborate with FinOps to enable usage metering and chargeback reporting.
  • Customer Engineering Support
  • Assist solution architects in onboarding customers, creating reference patterns for BFSI, Healthcare, and other verticals.
  • Provide troubleshooting and performance benchmarking guidance.
  • Continuous Improvement
  • Stay current with emerging model-serving frameworks and GPU acceleration techniques.
  • Contribute to reusable Helm charts, operators, and automation scripts.

Benefits

  • Our compensation reflects the cost of labor across several US geographic markets.
  • The base pay for this position ranges from $82,300/year in our lowest geographic market up to 140,580/year in our highest geographic market.
  • Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
  • Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
  • The compensation package may also include incentive compensation opportunities in the form of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP).
  • Learn more about benefits at Rackspace.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service