Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. You design computational engineering problems to challenge a frontier AI model. The problem must have an answer verifiable by code, and the problem has to require a specialized tool like OpenSeesPy, CalculiX, YADE, bempp-cl, or others. Generic numerical libraries on their own won't cut it. Each problem runs inside a sealed Linux container with the tool pre-installed and a programmatic judge that grades the model's answer. As an expert author, you pick an anchor tool and design a problem that hinges on its solvers, simulation kernels, or domain-specific models. You write a Python reference solution, supply input files and geometry definitions where needed. You decide the numerical answer and how close the model needs to get — with a domain-appropriate tolerance — to count as right. You test the problem against the model in batches of parallel attempts, tuning the problem difficulty until the agent only succeeds in a small number of attempts. Once you're happy with the task, and it scores within range, the task goes to a senior reviewer in your subfield. They will provide feedback to ensure task quality is high. Calibration requires patience. You're tuning the problem against batches of parallel runs of the agent, aiming for a pass rate in the 10–30% band. Reaching that means rewriting load cases, tightening boundary conditions, and watching how the agents act. You'll learn how these agents cut corners, where a simulation stalls, where a solver converges. This time compounds in two directions. You come out of each task with deeper command of the anchor tool itself, and also get a hands-on working intuition for how a frontier model navigates complex structural and geotechnical problems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Part-time
Career Level
Mid Level