We are building a benchmark dataset to evaluate AI models on professional document understanding and instruction following within the Engineering & Built Environment domain. Tasks consist of complex, multi-step requests grounded in real-world workspace files (technical drawings, project specifications, engineering reports), web search, and code execution — each paired with a clearly defined ground truth output and an objective evaluation rubric. You will be responsible for authoring tasks that test an AI's ability to interpret engineering documentation, follow multi-step instructions, and produce precise, well-structured outputs.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Part-time
Career Level
Senior
Education Level
No Education Listed