About The Position

We're looking for a Senior Azure Cloud Infrastructure Engineer to design, build, and operate a highly resilient, secure, and cost-efficient cloud platform supporting advanced AI workloads in a healthcare environment. This role is responsible for mission-critical infrastructure powering our proprietary foundational AI model, including GPU-based compute, while meeting strict requirements for compliance, data protection, and high availability. You will play a key role in ensuring our systems are fault-tolerant, auditable, and continuously optimized for both performance and cost. You will be building the backbone of the next-generation healthcare AI platform - where reliability, security, and performance directly impact real-world outcomes. This is not just infrastructure; it is critical systems engineering at the intersection of cloud, AI and healthcare.

Requirements

  • 5–8+ years of hands-on experience with Microsoft Azure cloud infrastructure
  • Proven experience designing high-availability and disaster recovery systems in regulated environments
  • Strong background in healthcare or other compliance-heavy industries
  • Deep expertise in: Azure Virtual Machines, VM Scale Sets, and GPU compute
  • Deep expertise in: Azure networking (VNets, Private Link, ExpressRoute, firewalls)
  • Deep expertise in: Storage solutions (Blob, Files, managed disks with redundancy options)
  • Experience implementing compliance frameworks such as HIPAA or SOC 2
  • Strong knowledge of identity and access control (RBAC, Azure AD, managed identities)
  • Experience with Kubernetes (AKS) and containerized workloads
  • Proficiency in scripting (Python, Bash, PowerShell)

Nice To Haves

  • Experience with Azure AI ecosystem (Azure Machine Learning, Azure AI Foundry, Cognitive Services)
  • Familiarity with distributed training, model parallelism, and GPU orchestration
  • Experience implementing MLOps pipelines in regulated environments
  • Azure certifications (Solutions Architect Expert, Security Engineer Associate, DevOps Engineer Expert)
  • Experience with zero-downtime deployments and blue/green or canary strategies

Responsibilities

  • Architect and manage highly available, fault-tolerant systems on Microsoft Azure with multi-region redundancy and disaster recovery
  • Design infrastructure with strict adherence to healthcare compliance standards (e.g., HIPAA, HITRUST, SOC 2)
  • Provision and optimize GPU-based environments for AI/ML workloads, including large-scale model training and inference
  • Build secure, zero-trust architectures (private networking, encryption, identity isolation, least privilege access)
  • Implement backup, failover, and business continuity strategies with clearly defined RTO/RPO targets
  • Continuously reduce infrastructure costs through intelligent scaling, reserved capacity, spot instances, and workload optimization
  • Develop Infrastructure as Code (Terraform, Bicep, ARM) for repeatable, auditable deployments
  • Partner with AI/ML teams to productionize and scale foundational models reliably
  • Establish observability across systems (logging, monitoring, alerting) with proactive incident response
  • Conduct architecture reviews, risk assessments, and security audits

Benefits

  • Paid vacation, sick time, and personal days
  • 11 company paid holidays
  • Quarterly UberEats voucher
  • Monthly Fringe benefits
  • Flexible work schedules
  • Professional development stipend
  • Health, dental, and vision benefits, with employer HSA contribution
  • STD, LTD and life insurance
  • 401(k) company match and profit sharing
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service