Senior ML Infrastructure Engineer

Orion InnovationEdison, NJ

About The Position

We're building a large-scale document intelligence platform that processes text files up to 5 TB in size, extracts insights using BERT-class NLP models, and surfaces answers to analysts via a low-latency query interface. The platform runs on Azure Kubernetes Service (AKS) with dedicated GPU node pools, uses KEDA for event-driven autoscaling, and integrates with Azure Data Lake Storage Gen2 and Azure OpenAI. This is a hands-on role that sits at the intersection of platform engineering and applied ML, and requires someone who is equally comfortable debugging a CUDA out-of-memory error and designing a Kubernetes autoscaling policy. As the Senior ML Infrastructure Engineer the resource will own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.

Requirements

  • Kubernetes / AKS: Multi-node-pool design, taint/toleration, autoscaler, GPU node pools (NC/ND series)
  • KEDA: Device plugin, driver compat, resource limits, KEDA
  • Scaled Job, queue triggers, cooldown tuning, CUDA / cuDNN
  • PyTorch (GPU inference): Runtime config via PyTorch; raw kernel dev not required
  • Batching, FP16, memory management, profiling, Hugging Face Transformers
  • BERT/DistilBERT/BGE loading, pipeline API, tokenization, Python (production)
  • Async workers, Azure SDK, queue consumers, Azure infrastructure
  • VNet, private endpoints, Key Vault, ADLS, AD, Docker / Helm
  • Multi-stage builds, Helm chart authoring, IaC (Terraform / Bicep)

Nice To Haves

  • Willingness to learn is acceptable

Responsibilities

  • Own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service