Member of Technical Staff - Evals

Entendre•New York City, NY

About The Position

As a Member of Technical Staff - Evals at Entendre, you will play a key role in ensuring the quality and reliability of our AI-powered features. Your primary responsibilities will include designing and maintaining evaluation frameworks to measure the accuracy, reliability, and regression behavior of our AI capabilities. You will also build automated test harnesses that operate on real accounting data, identifying potential failures before they impact our customers. Defining metrics and benchmarks to provide the team with clear, quantitative insights into model and system performance, and creating easy-to-use tooling that enables engineers to write, execute, and interpret evaluation tasks as a seamless part of their development workflow are also key aspects of this role. You will collaborate closely with our Applied AI and Agent Engineering teams to guarantee every shipped capability has a well-defined standard of "good enough"—and a robust method to verify it.

Requirements

Experience writing and reviewing evaluation tasks used by AI researchers to assess and improve model performance.
Background in quality assurance across software, client deliverables, or financial analysis.
Exceptional written communication skills, with the ability to clearly articulate steps to achieve desired outcomes.

Nice To Haves

Familiarity with accounting, financial systems, tax preparation, or payment technologies.

Responsibilities

Designing and maintaining evaluation frameworks to measure the accuracy, reliability, and regression behavior of our AI capabilities.
Building automated test harnesses that operate on real accounting data, identifying potential failures before they impact our customers.
Defining metrics and benchmarks to provide the team with clear, quantitative insights into model and system performance.
Creating easy-to-use tooling that enables engineers to write, execute, and interpret evaluation tasks as a seamless part of their development workflow.
Collaborating closely with our Applied AI and Agent Engineering teams to guarantee every shipped capability has a well-defined standard of "good enough"—and a robust method to verify it.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume