Principal Engineer, AI Platform

Epic GamesCary, NC
Remote

About The Position

Epic Games is seeking a Principal Engineer for its AI Platform team. This role is responsible for architecting and building production systems from the ground up, focusing on an enterprise-grade stack of agentic AI systems. These systems will automate engineering workflows, accelerate developer productivity, and enable new forms of collaboration across Epic's teams. The work involves foundational infrastructure that will define AI usage at Epic for the next decade, operating at a massive scale with significant architectural impact for every engineer on the team. The Principal Engineer will own the technical direction of the agent infrastructure stack end-to-end, setting architecture, driving alignment, and solving complex distributed systems and security problems. This is a hands-on role involving production code, protocol design, and accountability for system reliability.

Requirements

  • 12+ years of software engineering experience, with at least 4 years at staff or principal scope
  • Deep expertise in distributed systems: event-driven architectures, durable execution, service mesh, and multi-tenant platform design
  • Production experience with authentication and authorization infrastructure — OAuth 2.0, OIDC, SPIFFE/SPIRE or equivalent workload identity, token exchange (RFC 8693), and policy engines (OPA, OpenFGA, or comparable)
  • Strong security engineering fundamentals: credential vaulting, secrets management (OpenBao/Vault), audit trail design, and least-privilege access at scale
  • Fluency in at least one compiled, systems-capable language (Go preferred, Rust or C++ acceptable); comfort reading and writing Go microservices is essential given the stack
  • Track record of owning multi-service platform architecture across a full product lifecycle — from design through sustained production operation
  • Exceptional written communication: design documents and architecture reviews that are clear, precise, and influence without authority
  • Hands-on experience building LLM-integrated systems: agent orchestration, tool-use frameworks, MCP (Model Context Protocol), or equivalent agent-to-tool middleware
  • Experience with plugin or extension runtime design — WASM sandboxing, gRPC sidecar patterns, subprocess isolation, or comparable capability security models
  • Familiarity with knowledge graph systems (Neo4j or comparable), vector databases, and hybrid retrieval (semantic + keyword + graph), as well as experience operating Kubernetes-based platforms: scheduling, workload identity, sidecar injection, and multi-tenancy isolation

Responsibilities

  • Own the end-to-end technical architecture across Geppetto, EMA, Hodor, Multipass, Vektor, and Roost — ensuring each platform is coherent with the others and that the integration seams are well-defined
  • Drive architectural decisions for agent identity and workload authorization (SPIFFE/SPIRE, OIDC, token exchange, policy planes), translating security requirements into implementable designs
  • Establish the patterns for how AI agents authenticate, receive credentials, execute tools, and are audited — and hold the bar for correctness across the stack
  • Lead design reviews for new capabilities, evaluate build vs. buy decisions, and surface technical risk before it becomes production risk
  • Design and implement the Cluster API and provider abstractions for EMA — the layer that orchestrators depend on to launch, drive, and recover headless agent runs across Kubernetes, EC2, and other compute backends
  • Evolve Hodor's plugin runtime (WASM, gRPC sidecar, subprocess multiplexer) and its gateway security posture as external tool surface area grows
  • Architect Vektor's knowledge graph, vector search, and memory consolidation pipeline for org-wide scale — across teams, scopes, and retention horizons
  • Define durability, consistency, and isolation requirements across event-driven architectures (NATS JetStream, Redis) shared by multiple agent platforms
  • Lead the Multipass proposal from strategy into staffed execution — defining the separation of Hodor (human/tool) and Multipass (agent identity) and migrating the existing credential vault
  • Hold the standard for credential security across the stack: AES-256-GCM vault, AAD binding, scope isolation, default-deny policy, and audit completeness
  • Work with Epic's security organization to ensure agent-to-service trust models meet enterprise standards
  • Partner with product, ML, and enterprise platform teams to shape how agent capabilities are exposed to Epic's broader engineering organization
  • Mentor senior and staff engineers across the team; conduct technical interviews and raise the hiring bar
  • Write design documents that become the reference architecture for future work, not just approvals for current work

Benefits

  • Medical insurance
  • Dental insurance
  • Vision HRA
  • Long Term Disability
  • Life Insurance
  • 401k with competitive match
  • Robust mental well-being program through Modern Health (free therapy and coaching for employees & dependents)
  • Events and company-wide paid breaks
  • Unlimited PTO and sick time
  • Paid sabbatical for 7 years of employment
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service