AI Infrastructure Engineer San Jose, CA

ESRhealthcare and EXEC STAFF RECRUITERSSan Jose, CA
Onsite

About The Position

Architect and build custom Artificial Intelligence (AI) infrastructure solutions leveraging the Nutanix Kubernetes Platform and Nutanix AI. You will be responsible for designing high-performance computational stacks that integrate Nutanix AI, high-speed software-defined storage, and GPU-accelerated nodes. Your mission is to make AI infrastructure & invisible & by optimizing for performance, power consumption, and seamless hybrid-multicloud scalability across on-prem. As an AI Infrastructure Engineer, you will design tailored AI solutions that bridge the gap between private data centers and public cloud. Your day-to-day will involve optimizing the Nutanix computational stack for large language models (LLMs) and generative AI workloads. You will serve as the SME for Nutanix AI, ensuring that compute, storage (Nutanix Objects/Files), and networking (Flow) are perfectly tuned for AI model training and inference.

Requirements

  • AI
  • Kubernetes
  • Orchestration
  • DevOps
  • 10 years minimum experience
  • 12 years full-time education
  • Deep proficiency in AOS (Acropolis Operating System) and AHV (Native Hypervisor).
  • Experience with GPU Passthrough and vGPU configurations on Nutanix to optimize AI training performance.
  • Applying Nutanix Flow for micro segmentation to secure sensitive AI training data.
  • Using Nutanix Cloud Manager (NCM) Cost Governance to monitor and optimize spend across hybrid environments.
  • Hands on experience on Prometheus, Grafana, ELK, and OpenTelemetry.

Nice To Haves

  • Nutanix AI
  • Nutanix Objects/Files
  • Nutanix Calm
  • Terraform
  • Nutanix Flow
  • NC2 on Prem
  • Nutanix Kubernetes Platform (NKP)
  • Nutanix Cloud Manager (NCM) Cost Governance

Responsibilities

  • Design seamless AI workflows using NC2 on Prem, allowing for rapid bursting of AI workloads from on-prem AHV clusters to the public cloud.
  • Architect high-performance storage backends using Nutanix Objects (S3-compatible) to handle the massive datasets required for AI/ML.
  • Deploy and manage AI workloads using Nutanix Kubernetes Platform (NKP) to ensure containerized AI models are scalable and resilient.
  • Implement IaC using Nutanix Calm or Terraform to automate the lifecycle of GPU-enabled nodes.
  • Design frameworks (monitoring, logging, alerting) for proactive issue detection.
  • Ensure high availability, disaster recovery, and fault tolerance across all systems.
  • Familiarity with Zero-Trust architectures, enterprise networking, storage, and virtualization.
  • Modernize legacy 3-tier AI silos into a unified, web-scale Nutanix environment.
  • Act as the primary technical authority for Nutanix AI integrations within the San Jose office.
  • Work across teams to dismantle data silos, moving the organization toward a "One Platform" philosophy.
  • Stay ahead of Nutanix product roadmaps to inform long-term AI infrastructure strategy.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service