About The Position

We are seeking highly experienced software engineers (SR+) to evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code. This is not a traditional engineering role where you will be writing production code. Instead, you will be evaluating a more complex aspect: whether the AI model thinks like a great engineer. You will assess how AI coding agents behave in real-world scenarios, focusing on whether the response makes sense, if the preamble and reasoning are useful, if the output reflects strong engineering judgment, and if the interaction feels right to an experienced developer. This role emphasizes engineering taste over mere syntax correctness.

Requirements

  • Staff / Principal-level engineer or equivalent experience.
  • Strong background in TypeScript / JavaScript or Python.
  • Hands-on experience using OpenAI Codex, Claude Code, and Cursor.
  • Deep familiarity with modern AI-assisted dev workflows.
  • Ability to evaluate code without needing to fully execute or deeply review every line.
  • Comfortable giving direct, opinionated feedback.
  • High bar for what "good engineering" looks like.

Nice To Haves

  • Experience with tools like Cursor or similar AI-first IDEs.
  • Prior exposure to prompt design or evaluation workflows.
  • Experience mentoring senior engineers or defining engineering standards.

Responsibilities

  • Evaluate AI-generated coding interactions end-to-end.
  • Judge whether outputs are useful, correct (at a high level), and aligned with how a strong engineer would think.
  • Assess the quality of explanations and reasoning, not just the code itself.
  • Distinguish between different levels of response quality.
  • Provide clear, opinionated feedback on what worked, what didn’t, and what felt "off" or misleading.
  • Help define what great looks like when interacting with tools like Cursor.

Benefits

  • Up to $200/hr (US and Canada)
  • Up to $150/hr (EU and Latam)
  • Up to $100/hr (Other locations)
  • Project-based work
  • ~10-20 hours/week (potential for 40+ hrs once a project starts)
  • Possible extension
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service