Sr. Infrastructure & Security Engineer

HerculesAICampbell, CA

About The Position

We are seeking a Sr. Infrastructure & Security Engineer to provision and optimize GPU compute across various cloud providers, design and maintain IaC foundations for AI systems, implement policy-as-code guardrails, and enforce zero-trust architectures. This role involves configuring perimeter security, managing DNS and certificate lifecycles, leading vulnerability management, and partnering with customer success teams to secure customer environments. The engineer will also build CI/CD pipelines with security scanning, deploy and manage secure Kubernetes clusters, implement observability, and lead incident response and compliance efforts.

Requirements

  • Proven expertise with Terraform/Pulumi, IaC, policy-as-code, and scripting (Python, Bash, PowerShell)
  • Hands-on GPU compute provisioning across major cloud and specialized providers
  • Experience with Cloudflare or equivalent CDN/WAF/DDoS platforms for perimeter security and Zero Trust
  • Strong background in AWS, Azure, GCP, and on-prem infrastructure with secure architecture focus
  • Proficiency in Kubernetes and Docker, including container security, GPU scheduling, and runtime protection
  • Deep understanding of network security, zero-trust principles, IAM/RBAC, and secrets management
  • CI/CD experience with integrated security scanning
  • Ability to conduct security assessments, threat modeling, and work directly with customers

Responsibilities

  • Provision and optimize GPU compute across AWS, Azure, GCP, and specialized providers (CoreWeave, Lambda Labs), including Kubernetes GPU orchestration and hardware evaluation (NVIDIA H100/B200, AMD MI300X, Intel Gaudi)
  • Design and maintain IaC foundations (Terraform, Pulumi, Helm) for agentic AI systems, including agent orchestration platforms, RAG stacks, vector databases, and model serving endpoints
  • Implement policy-as-code guardrails (OPA, Sentinel, Kyverno) for autonomous agent workloads
  • Design and enforce zero-trust architectures with network segmentation, IAM/RBAC least-privilege, and secrets management (Vault, AWS Secrets Manager)
  • Configure and manage Cloudflare (or equivalent) for DDoS protection, WAF, bot management, SSL/TLS termination, and Zero Trust access
  • Manage DNS security (DNSSEC, DMARC, SPF, DKIM), certificate lifecycle, and API security controls (mTLS, token management)
  • Lead vulnerability management, penetration testing coordination, and CIS benchmarking
  • Partner with customer success teams to assess, secure, and threat-model customer deployment environments
  • Build and maintain CI/CD pipelines (GitHub Actions, GitLab CI) with integrated security scanning (SAST, DAST, SCA, container scanning)
  • Deploy and manage Kubernetes clusters across cloud and on-prem with security-hardened, GPU-enabled configurations
  • Implement observability (Prometheus, Grafana, Splunk, Datadog) and SIEM integrations
  • Lead incident response and drive compliance (SOC 2, ISO 27001, HIPAA, FedRAMP) through audit automation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service