About The Position

ZeroFox protects organizations from external threats across the public attack surface, and we're building agentic AI into the core of how that work gets done. You'll define and build the production agentic systems that power ZeroFox's product: the architectures, tooling, and practices that make agents manage state across steps, handle failures without losing context, and know when to escalate to a human. This is deep agent systems work. You'll be solving the hardest problems in multi-step orchestration, reliability, and evaluation for non-deterministic systems operating on adversarial data.

Requirements

  • 7+ years of software engineering experience building production systems, with hands-on experience designing and deploying LLM-based agents
  • Deep knowledge of agent reliability patterns: state management, error boundaries, escalation logic, context management, tool-calling failure modes
  • Experience with agent orchestration frameworks and understanding of tradeoffs between existing frameworks and building custom
  • Strong backend engineering fundamentals: testing, monitoring, deployment, debugging, performance optimization
  • Hands-on experience with retrieval-augmented systems and understanding of how retrieval quality affects agent behavior
  • Experience building systems that handle adversarial or noisy input (cybersecurity, fraud detection, content moderation, or similar domains)
  • Familiarity with cloud-based AI deployment, including observability, reliability, and cost considerations

Nice To Haves

  • Experience defining agent governance practices (versioning, behavioral telemetry, rollback) in a production environment
  • Background in cybersecurity or adjacent domain with high-noise, adversarial data
  • Experience with tool-use architectures and integration patterns for connecting agents to external systems
  • Track record of raising engineering capability across a team through mentoring, pairing, or design review

Responsibilities

  • Design and implement production agent architectures: state management, error handling, retry logic, graceful degradation, human-in-the-loop escalation
  • Build evaluation and testing frameworks for non-deterministic agent workflows: offline tests, synthetic data generation, regression checks, and post-deploy monitoring
  • Implement orchestration patterns: multi-agent coordination, tool-calling chains, memory management, context window optimization
  • Define deployment and governance practices: agent versioning, rollback, behavioral telemetry, anomaly detection
  • Instrument agents with tracing, logging, and observability that make production behavior debuggable
  • Establish architectural standards and best practices for agentic development through design reviews, pairing, and mentorship

Benefits

  • Competitive compensation
  • Community-driven culture with employee events
  • Regular catered lunches for in-office work; snacks, drinks available daily
  • Generous time off
  • Comprehensive health benefits & 401(k) plan
  • Fun, modern workspace
  • Respectful and nourishing work environment, where every opinion is heard and everyone is encouraged to be an active part of the organizational culture

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service