abra R&D is seeking an AI Evaluation & Reliability Engineer to contribute to the development of a next-generation agentic analytics platform, which is the first real-time database optimized for AI agents at scale. This Senior role focuses on defining and building the methodologies for measuring, validating, monitoring, and improving AI agents in production environments. The position is at the intersection of LLM systems, evaluation research, and production-grade engineering. The engineer will be responsible for designing evaluation methodologies, constructing LLM-as-a-judge systems, and developing agent-based testing frameworks to ensure the correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed