Lead AI Agent Engineer (Prompting & Evaluation)

Myria•Los Angeles, CA

58d•Remote

About The Position

Myria builds the private marketplace for the 300,000 most successful people in the world. We power bespoke services, access, and transactions that don't exist anywhere else. We're YC W22 with fast-growing product usage and a deep stack of AI agents that route, scope, and execute high-touch member requests. What you'll own Refine the global system prompt and per-category prompts for Houston (our AI mission control), driving down errors and ambiguity. Design/run evals and regression tests for new prompt changes, tools, and MCP integrations. Build and maintain prompt kits, playbooks, and examples that cover real-world edge cases (classification, intent, safety, intake, ideation vs. request). Collaborate with eng/product to wire new APIs/tools into the MCP server and ensure the agent uses them reliably. Ship fast: diagnose failures in production traces, patch prompts, validate with targeted evals, and land safe PRs.

Requirements

1+ year experience building with LLMs (prompting, tool use, safety, evaluation).
Proven experience designing evals/regression suites for conversational agents.
Strong product sense and crisp communication; you can translate business needs into prompt behaviors.
Comfortable owning prompts end-to-end in a live system (classification > routing > tool calls > member-facing responses).

Nice To Haves

Experience with RedwoodJS/TypeScript/GraphQL or similar stacks.
MCP/tool orchestration experience.
Luxury/concierge, travel, or high-touch service domain knowledge.

Responsibilities

Refine the global system prompt and per-category prompts for Houston (our AI mission control), driving down errors and ambiguity.
Design/run evals and regression tests for new prompt changes, tools, and MCP integrations.
Build and maintain prompt kits, playbooks, and examples that cover real-world edge cases (classification, intent, safety, intake, ideation vs. request).
Collaborate with eng/product to wire new APIs/tools into the MCP server and ensure the agent uses them reliably.
Ship fast: diagnose failures in production traces, patch prompts, validate with targeted evals, and land safe PRs.