As an Applied Scientist focused on Evaluation & Model Behavior, you will design and implement the systems used to measure and improve the performance of Computer Use Agents. This is not a support role. You will be responsible for the technical definition of model quality, including the design of evaluation metrics, the curation of training datasets, and the engineering of system prompts. You'll work directly with the engineering team to translate product requirements into technical specifications and quantifiable benchmarks. You'll focus on rigor, clarity, and impact, ensuring every metric, dataset, and prompt moves us toward more reliable, trustworthy agents.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level