Senior Software Engineer - AI Interaction Evaluator (Codex / Claude Code, up to $200/hr)

G2i Inc.•Miami, FL

2d•$50 - $200

About The Position

We are seeking highly experienced software engineers (Senior+ level) to evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code. This is not a traditional engineering role where you will be writing production code. Instead, you will be assessing a more complex aspect: whether the AI model 'thinks' like a great engineer. Your role will involve assessing how AI coding agents behave in real-world scenarios, focusing on the sensibility of their responses, the usefulness of their preambles and reasoning, whether their output reflects strong engineering judgment, and if the interaction feels right to an experienced developer. This role emphasizes engineering 'taste' over mere syntax correctness.

Requirements

Staff / Principal-level engineer or equivalent experience.
Strong background in TypeScript / JavaScript or Python.
Hands-on experience using OpenAI Codex.
Hands-on experience using Claude Code.
Hands-on experience using Cursor.
Deep familiarity with modern AI-assisted dev workflows.
Able to evaluate code without needing to fully execute or deeply review every line.
Comfortable giving direct, opinionated feedback.
High bar for what “good engineering” looks like.

Nice To Haves

Experience with tools like Cursor or similar AI-first IDEs.
Prior exposure to prompt design or evaluation workflows.
Experience mentoring senior engineers or defining engineering standards.

Responsibilities

Evaluate AI-generated coding interactions end-to-end.
Judge whether outputs are useful, correct (at a high level), and aligned with how a strong engineer would think.
Assess the quality of explanations and reasoning, not just the code itself.
Distinguish between different levels of response quality.
Provide clear, opinionated feedback on what worked, what didn’t, and what felt "off" or misleading.
Help define what great looks like when interacting with tools like Cursor.