Senior ML Infrastructure Engineer

Orion Innovation•Edison, NJ

About The Position

We're building a large-scale document intelligence platform that processes text files up to 5 TB in size, extracts insights using BERT-class NLP models, and surfaces answers to analysts via a low-latency query interface. The platform runs on Azure Kubernetes Service (AKS) with dedicated GPU node pools, uses KEDA for event-driven autoscaling, and integrates with Azure Data Lake Storage Gen2 and Azure OpenAI. This is a hands-on role that sits at the intersection of platform engineering and applied ML, and requires someone who is equally comfortable debugging a CUDA out-of-memory error and designing a Kubernetes autoscaling policy. As the Senior ML Infrastructure Engineer the resource will own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.

Requirements

Kubernetes / AKS Expert: Multi-node-pool design, taint/toleration, autoscaler, GPU node pools (NC/ND series)
Device plugin, driver compat, resource limits, KEDA
Scaled Job, queue triggers, cooldown tuning
CUDA / cuDNN
Runtime config via PyTorch; raw kernel dev not required
PyTorch (GPU inference)
Batching, FP16, memory management, profiling
Hugging Face Transformers
BERT/DistilBERT/BGE loading, pipeline API, tokenization
Python (production)
Async workers, Azure SDK, queue consumers
Azure infrastructure
VNet, private endpoints, Key Vault, ADLS, AD
Docker / Helm
Multi-stage builds, Helm chart authoring
IaC (Terraform / Bicep)

Nice To Haves

Willingness to learn is acceptable

Responsibilities

Own the end-to-end infrastructure layer — from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume