Senior Coding Annotator - Remote in USA

TransPerfect•Austin, TX

23d•Remote

About The Position

DataForce is seeking skilled Software Engineers to join our team as Coding Annotators to support the development and evaluation of advanced AI models. This role focuses on creating high-quality coding prompts and answers, benchmarking model performance, and identifying failure cases across internal and competitor models. Candidates will contribute to building realistic evaluation environments and supporting reinforcement learning workflows. The Coding Annotator will be responsible for creating programming prompts and reference solutions aligned with industry benchmarks, such as SWE-Bench and Terminal-Bench. The role involves testing model outputs to identify failures. The annotator will also support reinforcement learning workflows by building and maintaining coding environments and executing coding-specific validation checks. This role does not involve quality checking Annotator++ outputs, but instead focuses on domain-specific evaluation, benchmarking, and technical analysis to surface model limitations and performance insights.

Requirements

Strong proficiency in Python and front-end languages (JavaScript, TypeScript).
Demonstrated experience in code review and quality assurance.
Ability to handle repetitive tasks with attention to detail.
Minimum 2 years of work experience in related field.
Excellent written and verbal communication skills in English.
Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, or a closely related technical field is required.

Nice To Haves

Experience with Java is a plus.

Responsibilities

Create high-quality coding prompts and reference solutions aligned with industry benchmarks such as SWE-Bench and Terminal-Bench.
Develop prompts focused on code refactoring, code generation, and problem-solving scenarios.
Evaluate model outputs to identify errors, limitations, and failure patterns in reasoning, correctness, and execution.
Design and maintain coding environments used for evaluation and reinforcement learning (RL) pipelines.
Execute coding-specific validation checks using established criteria and tools provided by other annotation teams.
Document findings, evaluation results, and insights to support model improvement and training strategies.
Perform detailed code reviews and annotations for accuracy and compliance.
Work with Python and front-end technologies (JavaScript, TypeScript); potentially with Java, too.
Execute repetitive tasks with precision and maintain high standards of quality.
Collaborate with cross-functional teams to improve code quality and documentation.