Remote | Urdu-English AI Response Evaluator — $10–$20/hour

24-Mag•New York, NY

1d•Remote

About The Position

We are sharing a specialised part-time consulting opportunity for Urdu-English bilingual professionals experienced in language evaluation, LLM response review, fact-checking, structured feedback, and high-quality written analysis in English. This role supports current and upcoming remote consulting opportunities focused on Urdu-language AI response evaluation, bilingual quality review, factual accuracy assessment, reasoning analysis, rubric-based scoring, and high-quality project execution. Selected professionals will assess Urdu AI-generated responses, identify strengths and areas for improvement, fact-check outputs using trusted sources, and write clear English-language feedback based on structured evaluation criteria.

Requirements

Native fluency in Urdu
Strong English proficiency and excellent written communication skills
A bachelor's degree or equivalent academic background
Significant experience using large language models and understanding how people use AI tools
Strong ability to explain what makes an AI response accurate, incomplete, unclear, unrealistic, or poorly reasoned
Excellent attention to detail and ability to notice subtle issues in language, reasoning, and factual accuracy
Background or experience in structured analytical thinking, such as research, policy, analytics, linguistics, engineering, writing, or evaluation work
Urdu native fluency and strong English proficiency are required for project work

Nice To Haves

Prior experience with RLHF, model evaluation, data annotation, or AI response assessment
Experience writing, editing, or reviewing high-quality written content
Experience comparing multiple outputs and making fine-grained qualitative judgments
Familiarity with rubric-based evaluation, quality scoring, or structured feedback workflows
Ability to clearly explain factual inaccuracies, reasoning errors, and communication gaps

Responsibilities

Review Urdu AI-generated responses for accuracy, clarity, reasoning quality, tone, and completeness
Identify response strengths, improvement areas, factual inaccuracies, and communication gaps
Evaluate whether responses align with expected conversational behavior and project-specific guidelines
Apply native Urdu fluency and English writing ability to produce clear evaluation notes
Conduct fact-checking using trusted public sources and approved external tools
Assess whether responses are well-reasoned, complete, contextually appropriate, and useful
Identify subtle language issues, factual errors, unclear reasoning, or gaps in response quality
Generate high-quality human evaluation data through careful review and structured judgment
Apply structured rubrics and quality criteria to assess model response performance
Write clear, consistent, and reproducible feedback in English
Compare outputs and make fine-grained qualitative judgments when required
Maintain accuracy, consistency, and strong attention to detail across submitted evaluations