We are a small team tackling an ambitious problem. If we are successful, it will change the course of history. As such, we have a very high talent bar and are looking for people who have done something remarkable. This role owns the testing and evaluation systems that define whether Archie is actually becoming a better engineer. You will design, implement, and operate the evals that benchmark Archie against real-world engineering skill expectations, ensure it is learning the right things, and prevent regressions as the system evolves. You will work closely with AI researchers, software engineers, domain experts, and industrial partners to translate engineering judgment into scalable, automated evaluation frameworks. Your work will directly shape how we measure progress toward engineering AGI. We don’t care if you’ve done it before. We just need you to be brilliant, mission-driven, and thirsty to learn. This role can be either remote (based in the US or Canada and with existing work authorization) or based in our SF office. If you are remote, you should plan to spend one week out of six co-working with the rest of the company in our SF office. We will support relocation for candidates interested in moving to SF.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level
Education Level
No Education Listed