Prompt Engineer

Pencil

54d•Remote

About The Position

At Pencil, we are building the agentic OS for marketing, moving beyond simple text-in/text-out interfaces to complex multi-agent architectures that ensure professional, brand-safe, and scalable generative AI solutions for global brands and small businesses. We are seeking a Senior Agent Architect who thinks in systems to design the agent logic, tool-calling structures, and evaluation loops that power our core creative engine and bespoke client solutions. This role involves engineering behavior rather than just prompting, bridging the gap between creative intent and machine execution to ensure robust, predictable, and high-fidelity output across text, image, and video. The work is divided into two main pillars: designing and scaling foundational agents for the Pencil platform (Core Systems) and architecting custom workflows for world-class brands requiring specific brand DNA and complex creative logic (Client Solutions).

Requirements

3+ Years of Direct GenAI Experience: Deep, intuitive, and technical understanding of LLMs (GPT-4, Claude, Gemini) and multimodal models (Stable Diffusion, Midjourney, Video Gen).
Systems Thinking: Ability to think about the latent space, the context window, and how one agent's output becomes another’s input.
Technical Proficiency: Basic familiarity with Python, JSON structures, and how API documentation works, or at least be eager to pick them up.
Evaluation Obsession: Belief that if you can’t measure a prompt’s performance, you shouldn’t ship it, and familiarity with benchmarking and A/B testing AI outputs.
The "Creative/Technical" Bridge: Ability to sit in a room with a Creative Director and translate "make it feel more punchy" into a temperature adjustment and a few-shot prompting strategy.

Nice To Haves

Already worked with AI orchestration tools like LangChain, CrewAI, or AutoGen.

Responsibilities

Design and implement multi-agent workflows, including task decomposition, state management, and tool-use (RAG, API integration, etc.).
Implement rigorous evaluation frameworks (e.g., using LLM-as-a-judge, promptfoo, or DSPy) to measure and improve agent performance at scale.
Develop reusable agents and prompt libraries that allow the platform to serve 1,000+ brands with unique voices simultaneously.
Partner with AI Engineering and Product teams to determine which behaviors should be handled via prompting, RAG, or fine-tuning.
Act as the technical lead for complex client deployments, translating high-level creative briefs into deterministic AI workflows.

Benefits

25 days PTO plus public holidays, although we operate a Flexible Time Off scheme.
Health insurance / private medical cover.
Monthly stipend towards wellness, fitness, and learning and development.
Remote - work from anywhere in your home country.
Enhanced parental leave policies, whether you become a parent through birth, adoption or surrogacy.
Access to our Pencil office in The Shard, London for our UK employees.
Flexible working hours.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume