Senior ML Infrastructure Engineer

Orion InnovationEdison, NJ

About The Position

We're building a large-scale document intelligence platform that processes text files up to 5 TB in size, extracts insights using BERT-class NLP models, and surfaces answers to analysts via a low-latency query interface. The platform runs on Azure Kubernetes Service (AKS) with dedicated GPU node pools, uses KEDA for event-driven autoscaling, and integrates with Azure Data Lake Storage Gen2 and Azure OpenAI. This is a hands-on role that sits at the intersection of platform engineering and applied ML, and requires someone who is equally comfortable debugging a CUDA out-of-memory error and designing a Kubernetes autoscaling policy. As the Senior ML Infrastructure Engineer the resource will own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.

Requirements

  • Kubernetes / AKS Expert: Multi-node-pool design, taint/toleration, autoscaler, GPU node pools (NC/ND series)
  • Device plugin, driver compat, resource limits, KEDA
  • Scaled Job, queue triggers, cooldown tuning
  • CUDA / cuDNN
  • Runtime config via PyTorch; raw kernel dev not required
  • PyTorch (GPU inference)
  • Batching, FP16, memory management, profiling
  • Hugging Face Transformers
  • BERT/DistilBERT/BGE loading, pipeline API, tokenization
  • Python (production)
  • Async workers, Azure SDK, queue consumers
  • Azure infrastructure
  • VNet, private endpoints, Key Vault, ADLS, AD
  • Docker / Helm
  • Multi-stage builds, Helm chart authoring
  • IaC (Terraform / Bicep)

Nice To Haves

  • Willingness to learn is acceptable

Responsibilities

  • Own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service