Remote | Academic Research & Python Task Consultant — $70–$100/hour

24-Mag•New York, NY

12h•Remote

About The Position

We are sharing a specialised part-time consulting opportunity for professors, PhD students, and advanced academic researchers experienced in domain-specific problem design, Python-based evaluation, benchmark task development, and structured reasoning assessment. This role supports current and upcoming remote consulting opportunities focused on academic benchmark task design, Python-based evaluation workflows, domain-specific problem development, golden solution preparation, model behavior analysis, and high-quality project execution. Selected professionals will apply their academic expertise to create challenging real-world tasks, define precise expected outputs, develop executable tests, and evaluate reasoning or problem-solving performance across advanced subject areas.

Requirements

Current or retired professor status, or current PhD student status, in a relevant academic or professional field
Academic expertise in STEM, quantitative, professional, or research-intensive domains
Working proficiency in Python applied through research, industry work, GitHub projects, coursework, or technical task development
Ability to design rigorous domain-specific problems and evaluate solutions with precision
Strong reasoning, written communication, problem-solving, and independent work skills
Ability to manage time effectively and contribute reliably in a remote project-based environment
Availability for high-commitment project work, potentially 30+ hours per week during weekdays depending on project scope
A completed or in-progress PhD from a strong university program is highly relevant
Academic backgrounds may include machine learning, coding, data science, computer science, physics, mathematics, engineering, statistics, biology, chemistry, finance, accounting, economics, law, business, or related fields
Teaching, research, publication, technical writing, benchmark design, coding, or evaluation experience may be especially valuable
Must be based in the United States depending on project needs

Nice To Haves

Experience in AI training, model evaluation, benchmark development, data annotation, or structured task review
Experience writing Python tests, executable checks, golden solutions, or reproducible research code
Familiarity with agentic task design, model behavior analysis, reasoning evaluation, or failure-mode classification
Experience developing academic assessments, problem sets, rubrics, grading criteria, or research evaluation materials
Strong ability to turn complex academic or professional problems into clear, testable tasks

Responsibilities

Design challenging, real-world problems drawn from your academic or professional domain
Create tasks across areas such as machine learning, coding, data science, computer science, physics, mathematics, engineering, statistics, biology, chemistry, finance, accounting, economics, law, or business
Build tasks that test reasoning, problem solving, instruction following, and domain-specific judgment
Ensure task prompts are clear, rigorous, realistic, and aligned with expert-level expectations
Prepare task specifications, golden solutions, and supporting evaluation components using Python
Develop executable tests or structured checks that support objective evaluation
Translate complex domain problems into clear, testable workflows with measurable success criteria
Review task materials for correctness, completeness, reproducibility, and technical clarity
Evaluate model or agent performance on domain-specific tasks
Identify tasks where outputs fail to satisfy tests, instructions, or expected reasoning standards
Classify failure modes involving logical reasoning, problem decomposition, technical execution, or domain understanding
Write clear analysis explaining where and why a task response succeeds or fails
Develop detailed rubrics and evaluation frameworks for academic and technical benchmark tasks
Apply consistent evaluation standards across tasks, outputs, and solution materials
Provide clear written feedback explaining quality, reasoning gaps, and improvement areas
Collaborate with other subject matter experts to support consistency and accuracy across review workflows

Benefits

Competitive hourly compensation
Remote structure
Flexible scheduling
Weekly payments
Projects may be extended, shortened, or adjusted depending on scope and performance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume