About The Position

As a Staff Platform Engineer - AI Infrastructure, you will be responsible for building and scaling the infrastructure behind Paytm's AI inference platform. This platform serves both internal teams and enterprise customers, supporting new customer use cases from the ground up. Your role will involve owning GPU infrastructure, model hosting and serving, and multi-model routing across various modalities. This includes managing Paytm's own coding and domain-specific models (voice, vision, risk, fintech workflows) as well as third-party models on shared GPU and accelerator clusters. You will also develop self-service platforms that enable teams to provision compute, deploy, customize models, and manage resources through APIs and control planes, eliminating the need to rebuild infrastructure for each AI use case. Your contributions will establish the AI control plane for Paytm Intelligence (Pi), encompassing policy-driven routing, quotas, observability, and visibility into usage and costs. This work will directly impact the speed of agent and AI feature deployment, their reliability, and the efficiency of hardware utilization across various domains like payments, risk, fraud, collections, support, and developer experience.

Requirements

  • 8+ years of software engineering experience, including 3+ years building infrastructure platforms or ML/AI infrastructure
  • Deep experience with cloud infrastructure (AWS, GCP) and Kubernetes
  • Hands-on experience with GPU workloads and model serving (vLLM, TensorRT-LLM, Triton, or similar)
  • Strong software engineering fundamentals in Python, Go, or C++
  • Experience with infrastructure-as-code (Terraform, Pulumi, CDK)
  • Experience designing self-service platforms or internal developer tooling
  • Understanding of model optimization: quantization, batching, serving architectures
  • Proven ability to lead complex cross-team technical initiatives
  • Strong communication skills and the ability to influence technical direction

Nice To Haves

  • Experience building or operating inference infrastructure at scale
  • Experience with CUDA, GPU scheduling, or hardware-level optimization
  • Experience with multi-model serving across different modalities
  • Experience with edge inference or on-device model deployment
  • Experience with model fine-tuning infrastructure (LoRA, QLoRA, PEFT)
  • Background in fintech or regulated industries

Responsibilities

  • Design and operate GPU infrastructure for model hosting, including provisioning, scheduling, and cost optimization across cloud and on-premise environments
  • Build and scale model serving systems using vLLM, TensorRT-LLM, Triton, or equivalent, supporting real-time inference with strong latency and availability guarantees
  • Implement multi-model routing to serve multiple models across modalities (text, voice, code, vision) on shared infrastructure
  • Own the model lifecycle end to end: download, deploy, serve, monitor, swap, and scale
  • Drive inference optimization including quantization strategies (AWQ, GPTQ), batching, caching, and cold start reduction
  • Build self-service infrastructure platforms where teams provision compute, storage, and model endpoints through APIs and control planes
  • Implement infrastructure-as-code at scale using Terraform, Pulumi, or CDK
  • Build observability and reliability for inference systems: SLIs/SLOs, GPU utilization monitoring, latency tracking, automated capacity planning, and alerting
  • Define platform standards and governance including multi-tenant isolation, cost attribution, and resource quotas
  • Lead architectural design and influence engineering direction across the AI infrastructure stack
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service