Engineer, Production Engineering

Guild.aiSan Francisco, CA
Hybrid

About The Position

We are building the control plane for AI agents in teams and companies. As a Production Engineer, you will own the infrastructure, security, and compliance systems that allow our platform to ship fast and run reliably at scale. This is not a traditional ops role — you will write real code, contribute directly to the product, and own the full security and compliance surface of an early-stage company. You'll work across Kubernetes infrastructure, cloud delivery, agent sandboxing, SOC2 compliance, IT systems, and production observability — and you'll contribute to the product itself, building security-sensitive features and auditing application code for vulnerabilities. If you want to own the production backbone for the agent-native era — from a Terraform module to a pentest to an API key implementation — we want to talk.

Requirements

  • 5+ years in Production Engineering, Platform Engineering, or a security-focused infrastructure role, ideally at a fast-growing startup or SaaS company.
  • Strong hands-on experience with Kubernetes and GCP in production; comfortable with Terraform for managing real infrastructure.
  • Strong programming skills (Python, Go, TypeScript, etc.) with a passion for automating away toil.
  • Hands-on experience with compliance frameworks (SOC2), vulnerability management, and secure system design.

Nice To Haves

  • Background with multi-tenant SaaS or enterprise security and procurement requirements.
  • Exposure to AI/ML infrastructure, particularly agent runtimes.
  • Experience building security-sensitive product features alongside infrastructure work.
  • Experience supporting pentests / bug bounties
  • Experience deploying and operating in customer VPCs or other external cloud environments across AWS, Azure, and/or GCP — navigating enterprise networking, security, and access constraints.

Responsibilities

  • Manage and evolve our production and staging infrastructure on GCP (GKE) using Terraform. Own DNS, networking, and environment configuration end-to-end.
  • Deploy and operate within customer VPCs across AWS, Azure, and GCP — adapting to varied infrastructure constraints, security requirements, and enterprise networking configurations.
  • Build and maintain Kubernetes-based sandboxing for agent execution — ensuring agents operate within strict network boundaries and must route through our API gateway rather than having unfettered internet access.
  • Own our observability stack, including OpenTelemetry instrumentation and integrations with New Relic and Splunk, to give the team deep visibility into system performance and agent runtime behavior.
  • Lead infrastructure and operational work to support SOC2 compliance, including audit preparation, evidence collection, and control implementation.
  • Manage our HackerOne engagement — coordinating pentests, triaging incoming bug bounty reports, and driving remediation.
  • Audit application code for security vulnerabilities, contribute security-sensitive product features (e.g., API key management), and ensure product and infrastructure security are coherent end-to-end.
  • Own our IT stack — Okta, device management, and access controls — keeping the company secure as we scale.
  • Design and maintain safe, automated CI/CD workflows supporting rollout strategies like canary and blue-green deployments.
  • Make shipping to production a routine, boring, highly automated non-event.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service