As part of the work on machine-generated dialog, we are developing novel measurements of its quality. These include cutting-edge llm-judges for aspects like groundedness (lack of hallucinations), Siri Tone and Style (a suite of Design requirements), Safety, and others. To measure our progress on this front, we need to track the state of our dataset composition, accuracy of llm-judges, human expert review results in a central and visual representation.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Industry
Computer and Electronic Product Manufacturing
Education Level
Master's degree