Senior Platform & Infrastructure Engineer

Nexus Health Systems LtdHouston, TX
Hybrid

About The Position

The Senior Platform & Infrastructure Engineer is the principal technical contributor within the NXHS Corporate IT Platform & Operations team. This is a dual-mission role: the primary focus is designing, building, and operationalizing AI and agentic automation solutions that transform clinical and business operations across all Nexus Health Systems facilities. The secondary focus is enterprise infrastructure engineering, ensuring the compute, networking, identity, and cloud platforms that underpin these AI systems, and all hospital IT operations are reliable, secure, and HIPAA-compliant. NXHS currently operates an NVIDIA stack for on-premises agentic AI, but the organization maintains platform flexibility and may pivot to or incorporate Azure AI Foundry, AWS agentic services, or other emerging platforms as the landscape evolves. The right candidate is not married to a single stack, they are fluent across cloud and on-premises AI platforms and can adapt as strategic direction shifts. This role works directly alongside the Senior Manager, Platform & Operations on R&D initiatives, including multi-agent AI architectures, LLM orchestration, RPA with agentic bolt-ons, and enterprise integration development. The role also collaborates closely with the Senior Data Engineer on data layer architecture, ensuring AI agents can safely and efficiently query, interpret, and act on data within the SQL Data Warehouse. The ideal candidate is equally comfortable architecting an agent swarm with persistent memory over a SQL data warehouse as they are managing Azure hybrid infrastructure and enterprise networking for a multi-site healthcare system.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related field required; equivalent professional experience considered.
  • 10+ years of hands-on experience in infrastructure engineering or platform development, with demonstrated ability to operate at a senior level across enterprise environments.
  • 2+ years of hands-on AI/ML engineering experience, building with LLMs, agent frameworks, RAG pipelines, or agentic automation in a real environment (not just coursework or tutorials).
  • Demonstrated experience building and deploying AI/ML solutions beyond proof-of-concept, whether in production, internal tooling, or serious R&D.
  • Experience with multi-agent system design patterns: shared vs. isolated memory, message bus architectures, agent specialization, and tool-use frameworks.
  • Experience supporting healthcare EHR platforms (Meditech, Epic, or similar) from an infrastructure perspective.
  • SQL proficiency sufficient to partner daily with the Senior Data Engineer on warehouse schema design, query optimization, data pipeline architecture, and AI agent data access patterns.
  • Comfort operating with minimal vendor support on bleeding-edge platforms. Proven ability to pioneer through ambiguity via documentation, experimentation, and community engagement.
  • Certifications: Azure Administrator, Azure AI Engineer Associate, Azure Solutions Architect, AWS Certified Solutions Architect, AWS Certified Machine Learning Engineer, NVIDIA Certified Professional, or equivalent.
  • LLM orchestration frameworks: LangChain, LlamaIndex, Semantic Kernel, or equivalent agent-building toolkits.
  • Vector databases and embedding pipelines (Pinecone, Qdrant, pgvector, or equivalent) for RAG and agent memory architectures.
  • NVIDIA AI stack: Nemotron models, NVIDIA Guardrails, DGX administration, GPU compute management. Must be willing to pivot if the organization adopts alternative on-prem or cloud-native agentic platforms.
  • Cloud AI services across multiple providers: Azure AI Foundry (OpenAI, Cognitive Services), AWS AI/agentic services (Bedrock, SageMaker), or equivalent. Must be comfortable operating across cloud boundaries, not single-platform dependent.
  • Python development for AI/ML workflows, API development (FastAPI, Flask), and scripting/automation.
  • REST API design, webhook architectures, OAuth/app registration, and Microsoft Graph API integration for programmatic access to M365 services.
  • RPA platforms and intelligent automation design, with an understanding of how agentic AI extends traditional RPA capabilities.
  • Microsoft Azure ecosystem: hybrid cloud architecture, identity (Entra ID), networking, and security.
  • Enterprise virtualization (VMware or Hyper-V), Windows Server and Linux administration, and OS hardening.
  • Enterprise networking: TCP/IP, VLANs, routing/switching, firewall management, VPN technologies.
  • Infrastructure-as-Code (Terraform preferred) and PowerShell scripting for automation and configuration management.
  • Microsoft 365 enterprise administration, Microsoft Defender security stack, and HIPAA Security Rule requirements for infrastructure.

Nice To Haves

  • Healthcare IT experience preferred, particularly supporting clinical environments with 24/7 uptime requirements.
  • Hands-on experience with NVIDIA AI stack preferred, strong aptitude to learn required.
  • Terraform preferred for Infrastructure-as-Code.

Responsibilities

  • Design, build, and operationalize multi-agent AI systems on the current NVIDIA stack, while maintaining the ability to architect equivalent solutions on Azure AI Foundry, AWS agentic services, or other platforms as the organization’s strategic direction evolves. Agent orchestration, swarm architectures, task decomposition, and inter-agent communication patterns for clinical and operational use cases.
  • Architect and implement memory permanence and learning-over-time capabilities for AI agents, including vector store design, RAG (Retrieval-Augmented Generation) pipelines, embedding strategies, and state management across agent sessions.
  • Build integration layers between AI services and enterprise platforms, including Microsoft 365 (Graph API, core services), SQL Data Warehouse, and clinical systems, enabling agents to consume and act on organizational data.
  • Develop and deploy LLM-powered solutions using orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel, or equivalent), including prompt engineering at a systems level, tool/function calling architectures, and chain-of-thought workflows.
  • Design and implement RPA (Robotic Process Automation) workflows with agentic AI bolt-ons, automating clinical and administrative processes that currently require manual intervention.
  • Spin up, configure, and manage AI model deployments across multiple platforms — on-premises GPU infrastructure (NVIDIA), Azure AI Foundry, and AWS agentic/AI services, including model selection, fine-tuning, and performance optimization. The organization actively evaluates and pivots between platforms; vendor lock-in is not acceptable.
  • Build REST APIs, webhooks, middleware, and connector services that bridge AI/agent outputs to front-end applications, enabling end users to interact with intelligent systems through web-based interfaces and internal platforms.
  • Partner closely with the Senior Data Engineer on all data layer work, including enabling AI agent access to the SQL Data Warehouse, designing query patterns for autonomous retrieval, building ETL-to-agent handoff points, co-developing data schemas that support both BI reporting and agentic consumption, and establishing guardrails for autonomous data operations in a HIPAA-governed environment. This is a daily working relationship, not a periodic handoff.
  • Conduct hands-on R&D on emerging AI platforms, tools, and architectures with limited vendor documentation or community support. Ability to pioneer in ambiguous technical territory is essential.
  • Design, engineer, and operate enterprise infrastructure platforms across on-premises and hybrid environments, including compute, virtualization (VMware vSphere / Hyper-V), storage, and backup/DR solutions that protect patient data and clinical systems.
  • Architect and manage Microsoft Azure hybrid cloud environments, including compute, networking, identity (Entra ID), and security services aligned with NXHS compliance requirements.
  • Develop Infrastructure-as-Code (IaC) using Terraform for automated, auditable provisioning across clinical and administrative environments.
  • Architect, manage, and troubleshoot enterprise networking (LAN/WAN, VLANs, routing/switching, wireless, VPN, firewall) across corporate and clinical facilities.
  • Administer Microsoft 365 tenant services (Exchange Online, SharePoint, Teams), including security configuration, DLP, and retention policies aligned with HIPAA requirements.
  • Ensure infrastructure configurations comply with HIPAA, CIS benchmarks, and organizational security baselines. Partner with cybersecurity and IT leadership on identity governance, vulnerability remediation, and infrastructure hardening.
  • Deploy and manage the Microsoft Defender security stack (Endpoint, Servers, Cloud, Identity) across hybrid infrastructure.
  • Serve as the primary R&D partner to the Senior Manager, Platform & Operations on AI and agentic initiatives, picking up technical threads independently when leadership bandwidth is constrained.
  • Partner with Clinical Informatics, Data Engineering, and Service Delivery teams to ensure platform readiness for AI-powered application deployments, clinical system implementations, and enterprise modernization.
  • Evaluate emerging AI platforms, agent frameworks, and infrastructure capabilities; deliver strategic recommendations to IT leadership on architecture decisions that will define the next 3 years of NXHS technology.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service