Remote | LLM Personal Assistant Evaluation Specialist — $70–$180/hour

24-Mag•New York, NY

16h•$70 - $180•Remote

About The Position

We are sharing a specialised part-time consulting opportunity for advanced LLM power users experienced in personalized AI workflows, rubric-based evaluation, real-world task assessment, personal productivity systems, and high-context decision support. This role supports current and upcoming remote consulting opportunities focused on evaluating how AI systems handle personalized, real-world life tasks across food, health, productivity, career, learning, research, planning, and personal workflow scenarios. Selected professionals will create realistic prompts, complete complex AI-assisted tasks, record workflow execution, design or apply detailed rubrics, and evaluate whether AI outputs are useful, personalized, practical, safe, and successful in real-life contexts.

Requirements

Heavy personal usage of LLM products and AI tools
Experience using AI for multi-step tasks, planning, research, decision-making, personal workflows, or life administration
Familiarity with tools such as ChatGPT, Claude, Gemini, Perplexity, Cursor, Windsurf, Codex, or other AI agents
Strong ability to explain what makes an AI output useful, incomplete, unsafe, unrealistic, generic, or poorly personalized
Extensive rubric experience, including prior rubric design, evaluation, and quality assessment work
Strong written judgment, attention to detail, and ability to evaluate against structured criteria
Ability to complete tasks within 24 hours when project timing requires
Practical experience using LLMs for complex personal workflows, rubric-based evaluation, research, writing, QA, product testing, or AI assessment is highly relevant
Access to a desktop or laptop computer suitable for project work and screen recording
Eligible professionals should be based in the United States depending on project needs

Nice To Haves

100+ hours of prior rubric-related work involving rubric design, evaluation, model assessment, quality review, or structured judgment
Experience evaluating AI tools across personal productivity, career planning, food recommendations, learning workflows, health-adjacent reasoning, or personal research tasks
Strong familiarity with personal AI workflows involving calendars, reminders, errands, job applications, LinkedIn, resumes, study plans, restaurant selection, or decision support
Ability to record screen-based workflows clearly and follow detailed task instructions

Responsibilities

Create written responses, prompts, and explanations for complex personal-life tasks
Evaluate whether AI outputs are practical, well-reasoned, personalized, realistic, and successful
Identify where outputs succeed, miss context, overreach, provide generic advice, or fail to account for real constraints
Use hands-on LLM experience to assess real-world usefulness across high-context personal workflows
Apply structured rubrics and quality criteria to evaluate AI system performance
Create detailed evaluation rubrics for complex personal tasks and multi-step workflows
Judge outputs against criteria involving usefulness, personalization, reasoning quality, safety, completeness, and success conditions
Write clear, specific, and well-supported feedback explaining evaluation decisions
Execute AI-assisted tasks while recording screens according to project instructions
Review task performance across tools, prompts, reasoning steps, outputs, and final recommendations
Complete research-intensive personal workflows end-to-end within expected turnaround timelines
Maintain careful documentation of task setup, execution, rubric design, and evaluation results

Benefits

Flexible scheduling
Competitive hourly compensation
Weekly payments via Stripe or Wise
Projects may be extended, shortened, or adjusted depending on scope and performance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume