Engineer Jobs

10,000 jobs found β€” updated daily

AI Infrastructure Operations Engineer

Private Health Management
β€’$120,000 - $140,000β€’Remote

About The Position

PHM is building and scaling Companion, an AI-enabled clinical platform operating in a high-trust healthcare environment where reliability, observability, and security are foundational requirements. The platform includes headless AI agents designed to support clinical and operational professionals by acting as intelligent workstations that integrate with enterprise applications and workflows. The AI Infrastructure & Operations Engineer will operationalize the platform so it runs reliably at production scale, helping ensure the systems behind Companion are observable, recoverable, secure, and maintainable as adoption grows. This role sits at the intersection of Kubernetes operations, AI platform reliability, observability engineering, and operational security. You will help evolve and maintain the Azure-based infrastructure stack while partnering closely with technology leadership, AI architects, and security stakeholders. This is a high-ownership role for someone who thrives in fast-moving environments, is comfortable operating with incomplete information, and enjoys building operational discipline around emerging AI systems.

Requirements

  • Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.
  • Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling.
  • Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms.
  • SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis.
  • Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas.
  • Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.

Nice To Haves

  • Experience with CI/CD pipeline tooling including GitHub Actions, Kaniko, cosign, image signing, or Actions Runner Controller.
  • Familiarity with Infrastructure as Code practices using Bicep or Azure resource automation tooling.
  • Exposure to HIPAA, SOC2, or other compliance-aware operational environments.
  • Experience supporting AI or LLM-backed applications in production environments.

Responsibilities

  • Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
  • Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
  • Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
  • Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
  • Operate and Improve Platform Reliability: Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services.
  • Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling.
  • Support release operations and help ensure deployments remain stable, observable, and recoverable.
  • Build Observability and Operational Insight: Develop dashboards, alerts, logging patterns, and operational baselines using Azure Log Analytics and Application Insights.
  • Identify system trends, performance bottlenecks, and emerging operational risks across infrastructure and AI workloads.
  • Improve visibility into AI agent behavior, enterprise workflow integrations, latency patterns, and system health under real user load.
  • Strengthen Security and Operational Hygiene: Maintain operational cadence for dependency updates, CVE remediation, image signing, secrets rotation, and cluster patching.
  • Support security-first infrastructure practices across Kubernetes, CI/CD pipelines, and Azure environments.
  • Partner with security and engineering stakeholders to maintain compliance-aware operational practices in a HIPAA-regulated environment.
  • Collaborate Across a Small, High-Ownership Team: Work closely with technology leadership, platform engineers, security stakeholders, and AI architects to evolve the operational maturity of Companion.
  • Contribute documentation, operational runbooks, and shared knowledge that reduce platform fragility over time.
  • Help establish practical operational patterns for AI systems where industry best practices are still emerging.

Benefits

  • health/dental/vision benefits
  • annual cash incentive program
  • 401k with match
  • flexible PTO
  • PHM for PHM β€” our services for you and your dependents

Build a Resume for Engineer

The resume builder that gets results.

  • Get clear feedback so you look as qualified as you are
  • Align your resume with the job to get further in the process, faster
  • Take the guesswork out of resume writing

Explore Related Job Searches

Β© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service