As a Member of Technical Staff - Evals at Entendre, you will play a key role in ensuring the quality and reliability of our AI-powered features. Your primary responsibilities will include designing and maintaining evaluation frameworks to measure the accuracy, reliability, and regression behavior of our AI capabilities. You will also build automated test harnesses that operate on real accounting data, identifying potential failures before they impact our customers. Defining metrics and benchmarks to provide the team with clear, quantitative insights into model and system performance, and creating easy-to-use tooling that enables engineers to write, execute, and interpret evaluation tasks as a seamless part of their development workflow are also key aspects of this role. You will collaborate closely with our Applied AI and Agent Engineering teams to guarantee every shipped capability has a well-defined standard of "good enough"—and a robust method to verify it.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed