We are looking for strong engineers to join our team and own the leaderboards that appear on Vals AI. You will be responsible for testing and benchmarking new models as they are released on tasks in law, tax, coding, finance, and more. You will analyze error modes of models, evaluate their strengths and weaknesses, and work with our communications team to release results. Our results are used by startups, enterprises, and research labs alike. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg. We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will contribute directly to the leaderboards that make this possible.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level
Education Level
No Education Listed