We are seeking a Research Engineer specializing in Evals to join our mission of building everyday AGI. This role is crucial for ensuring that our models, agents, and product features demonstrably improve. You will be responsible for building the evaluation harness for AGI, covering model capability, agentic behavior, on-device performance, and end-user experience. Your work will establish the standard for what constitutes a 'shipped' product and protect that standard against product deadlines. You will own the eval suites that gate all model and agent releases, including capability, behavior, regressions, and human-rated rubrics. You will also develop the dashboards and tooling to facilitate researcher experiment loops and leadership decision-making. Ultimately, you will define and uphold the criteria for product readiness.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed