Generalist - English & Chinese (Mandarin)

Weekday AI

8d•$36•Remote

About The Position

This role is for one of our clients This role focuses on improving the quality, accuracy, and reliability of conversational AI systems. You will evaluate and enhance how large language models (LLMs) respond to real-world queries, ensuring outputs are clear, well-reasoned, and aligned with human expectations across a wide range of topics.

Requirements

Bachelor’s degree in any discipline.
Native-level fluency in Mandarin Chinese (ILR 5 / CEFR C2) and strong proficiency in English.
Hands-on experience using large language models (LLMs) with a strong understanding of their applications.
Excellent writing skills with the ability to provide structured and nuanced feedback.
Strong attention to detail and ability to identify subtle inconsistencies or issues.
Adaptable and comfortable working across multiple domains and topics.
Background in fields requiring structured analytical thinking such as research, analytics, policy, linguistics, or engineering.
Strong mathematical and logical reasoning skills at a college level.

Nice To Haves

Experience with RLHF (Reinforcement Learning from Human Feedback), model evaluation, or data annotation.
Background in writing, editing, or content quality review.
Experience comparing multiple outputs and making detailed qualitative judgments.
Familiarity with evaluation frameworks, scoring systems, or benchmarking methodologies.

Responsibilities

Evaluate AI-generated responses for their effectiveness in addressing user queries.
Conduct fact-checking using reliable public sources and external tools.
Generate high-quality evaluation data by identifying strengths, weaknesses, and factual inaccuracies.
Assess reasoning, clarity, tone, and completeness of responses.
Ensure outputs align with expected conversational standards and system guidelines.
Apply consistent annotations based on defined taxonomies, benchmarks, and evaluation frameworks.