About The Position

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. This opportunity is project-based, not permanent employment. We're building a dataset to evaluate AI coding agents — how well a model handles real-world developer tasks. You'll create challenging tasks and evaluation criteria within realistic simulated environments.

Requirements

  • Experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects
  • Degree in Computer Science, Software Engineering, or related fields
  • 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience writing tests (functional, integration — not just running them)
  • Docker containers, and familiarity with infrastructure tools (Postgres, Kafka, Redis)
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
  • English proficiency - B2
  • Comfortable reading and reasoning about code across the stack

Responsibilities

  • Build virtual companies following a high-level plan - codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history
  • Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair
  • Design tasks set in isolated environments - emulations of a developer's workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase
  • Write tests that accept all correct solutions and reject incorrect ones - neither too strict (breaking on valid approaches) nor too lenient (passing bad ones)
  • Iterate with an AI agent on tests - verifying they catch real problems, don't miss bad solutions, and don't break on good ones
  • Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios
  • Iterate based on feedback from expert QA reviewers who score your work on quality criteria

Benefits

  • Project-based AI opportunities
  • Up to $12 per hour equivalent compensation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service