AI Operations & Quality Engineer

Summit•Kelowna, BC

3d•CA$80,000 - CA$120,000•Hybrid

About The Position

This role focuses on ensuring the quality and efficiency of AI agents within a commercial brokerage. The AI Operations & Quality Engineer will be responsible for testing agent outputs, designing evaluation rubrics, troubleshooting issues, and optimizing model performance. They will also curate the memory library, author new skills, and iterate on agent prompts. A key aspect of the role involves working closely with broker teams to understand their needs, identify bugs, and translate feedback into product improvements. The engineer will partner with agent builders, review specifications, design test plans, and document agent capabilities. The goal is to ensure AI agents are reliable, cost-effective, and meet the needs of brokers and the business.

Requirements

2-4 years working hands-on with LLM-powered applications, AI agents, or production prompt-engineered systems (formal CS degree not required if your project work demonstrates the skill set).
Strong instinct for prompt engineering. You can read a system prompt and explain why an agent is misbehaving.
Comfort writing Python at a scripting level (API calls, JSON manipulation, light data transformation) — you'll be authoring skill scripts.
Experience designing test cases and evaluation criteria for non-deterministic systems, or a clear conceptual framework for doing so.
Sharp diagnostic instincts: when an agent misfires, you can trace whether the issue lives in the memory, the skill, the prompt, the model, or the input.
Strong written communication — you'll translate "this agent broke" into a clean diagnosis and a fix log the rest of the team can act on.

Nice To Haves

Hands-on experience with Anthropic's Claude or comparable frontier models (Opus / Sonnet / Haiku tier selection).
Familiarity with evaluation frameworks and rubric-based grading.
Insurance, financial services, or other regulated-document workflows where output accuracy is non-negotiable.
Exposure to HubSpot CRM, Slack, and Google Workspace as everyday operational tools.
Background in high-growth startups or AI-forward operations teams.

Responsibilities

Run regression tests against canonical scenarios for every production agent to catch drift, broken outputs, and edge cases.
Design and calibrate evaluation rubrics for each agent's deliverables and analyze score patterns to flag systemic issues.
Reproduce failed runs surfaced by brokers, root-cause them, and ship fixes via memory updates, skill edits, or prompt revisions.
Audit branded deliverables for brand compliance, verifying compliance across fonts, logos, layouts, and data structure.
Curate the memory library inside Synapse: dedupe, retire stale facts, tune importance and "when to use" routing so agents pull the right context.
Author new skills (documented playbooks + Python scripts) that turn repeatable workflows into reusable agent capabilities, with credential handling and verified outputs.
Iterate on agent system prompts and sub-agent personas; keeping them tight, testable, and aligned with how brokers actually phrase requests.
Monitor model spend across the agent fleet and tune model selection (Opus / Sonnet / Haiku and sub-agent defaults) to balance quality, latency, and cost.
Identify hot spots — agents over-spending on simple problems, or under-spending on hard ones, and rebalance.
Track token usage, latency, and error rates over time; publish a weekly fleet health report for the team.
Embed with broker teams, turning their feedback on bugs or feature gaps into roadmap items.
Partner with agent builders on new agents, reviewing specs, designing test plans, and signing off on launches.
Document agent capabilities, known limitations, and broker-facing workflows.

Benefits

Opportunity for Growth: Work with a forward-thinking commercial brokerage and be part of an innovative, growing team.
Modern Workspace: Work from our brand-new, state-of-the-art offices in downtown Kelowna and Winnipeg. Enjoy a flexible hybrid model designed to support collaboration, focus, and balance.
Technology-Driven Culture: Work with cutting-edge tools and custom built technology to get time back in your day. Laptops & equipment provided to all staff.
Comprehensive Benefits: Access to flexible health, mental health, dental plans tailored to your lifestyle, and competitive vacation and personal day allotments.
Supportive Team: Participate in daily team huddles and collaborative events as part of a values-driven culture.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume