About The Position

We are sharing a specialised part-time consulting opportunity for professors, PhD students, and advanced academic researchers experienced in domain-specific problem design, Python-based evaluation, benchmark task development, and structured reasoning assessment. This role supports current and upcoming remote consulting opportunities focused on academic benchmark task design, Python-based evaluation workflows, domain-specific problem development, golden solution preparation, model behavior analysis, and high-quality project execution. Selected professionals will apply their academic expertise to create challenging real-world tasks, define precise expected outputs, develop executable tests, and evaluate reasoning or problem-solving performance across advanced subject areas.

Requirements

  • Current or retired professor status, or current PhD student status, in a relevant academic or professional field
  • Academic expertise in STEM, quantitative, professional, or research-intensive domains
  • Working proficiency in Python applied through research, industry work, GitHub projects, coursework, or technical task development
  • Ability to design rigorous domain-specific problems and evaluate solutions with precision
  • Strong reasoning, written communication, problem-solving, and independent work skills
  • Ability to manage time effectively and contribute reliably in a remote project-based environment
  • Availability for high-commitment project work, potentially 30+ hours per week during weekdays depending on project scope
  • A completed or in-progress PhD from a strong university program is highly relevant
  • Academic backgrounds may include machine learning, coding, data science, computer science, physics, mathematics, engineering, statistics, biology, chemistry, finance, accounting, economics, law, business, or related fields
  • Teaching, research, publication, technical writing, benchmark design, coding, or evaluation experience may be especially valuable
  • Must be based in the United States depending on project needs

Nice To Haves

  • Experience in AI training, model evaluation, benchmark development, data annotation, or structured task review
  • Experience writing Python tests, executable checks, golden solutions, or reproducible research code
  • Familiarity with agentic task design, model behavior analysis, reasoning evaluation, or failure-mode classification
  • Experience developing academic assessments, problem sets, rubrics, grading criteria, or research evaluation materials
  • Strong ability to turn complex academic or professional problems into clear, testable tasks

Responsibilities

  • Design challenging, real-world problems drawn from your academic or professional domain
  • Create tasks across areas such as machine learning, coding, data science, computer science, physics, mathematics, engineering, statistics, biology, chemistry, finance, accounting, economics, law, or business
  • Build tasks that test reasoning, problem solving, instruction following, and domain-specific judgment
  • Ensure task prompts are clear, rigorous, realistic, and aligned with expert-level expectations
  • Prepare task specifications, golden solutions, and supporting evaluation components using Python
  • Develop executable tests or structured checks that support objective evaluation
  • Translate complex domain problems into clear, testable workflows with measurable success criteria
  • Review task materials for correctness, completeness, reproducibility, and technical clarity
  • Evaluate model or agent performance on domain-specific tasks
  • Identify tasks where outputs fail to satisfy tests, instructions, or expected reasoning standards
  • Classify failure modes involving logical reasoning, problem decomposition, technical execution, or domain understanding
  • Write clear analysis explaining where and why a task response succeeds or fails
  • Develop detailed rubrics and evaluation frameworks for academic and technical benchmark tasks
  • Apply consistent evaluation standards across tasks, outputs, and solution materials
  • Provide clear written feedback explaining quality, reasoning gaps, and improvement areas
  • Collaborate with other subject matter experts to support consistency and accuracy across review workflows

Benefits

  • Competitive hourly compensation
  • Remote structure
  • Flexible scheduling
  • Weekly payments
  • Projects may be extended, shortened, or adjusted depending on scope and performance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service