Founding LLM Evaluation Researcher

UniversalAGI

135d

About The Position

UniversalAGI is a forward deployed AI research lab building the future of AI automation. We are the go-to strategic AI partner when enterprises and government agencies need to gain competitive advantage, lead market transformation, or accelerate AI adoption. We've achieved significant traction with enterprise clients and are at an inflection point where we're scaling our technical capabilities to the next level. We're backed by Eric Schmidt, Elad Gil, Ion Stoica, and David Patterson. Our elite team brings experience from OpenAI, Tesla, NVIDIA, Apple, Palantir, Amazon, Princeton, Stanford, and UC Berkeley. We are seeking an exceptional Founding LLM Evaluation Researcher to build our comprehensive evaluation frameworks, stay at the forefront of AI research, design and execute rigorous experiments to evaluate autonomous agents, and develop innovative methodologies to enhance agent performance and capabilities in real-world deployments.

Responsibilities

Design comprehensive LLM evaluation frameworks from scratch for AI automation in government and enterprise environments
Build evaluation systems to measure and improve AI solution performance across production deployments
Develop evaluation methodologies for multi-agent systems operating autonomously in real-world applications
Optimize LLM outputs for specific enterprise use cases involving both structured databases and unstructured document repositories
Develop methodologies to improve model response accuracy and relevance for domain-specific applications
Bridge research findings into production-ready platform capabilities with robust evaluation metrics
Implement and conduct rigorous evaluation experiments to optimize agent performance and reliability
Stay current with cutting-edge research by reading and synthesizing findings from top-tier AI conferences and journals
Design and execute data collection strategies to build high-quality evaluation datasets tailored to specific use cases
Develop methodologies to achieve and maintain high accuracy standards across diverse AI automation tasks
Prototype new techniques and models for building autonomous AI agents, focusing on improving accuracy, efficiency, and reliability
Collaborate closely with product engineers to translate research advancements into practical applications and deployable solutions
Work with enterprise clients to understand evaluation requirements and success metrics
Document and communicate research findings through internal presentations, reports, and potentially external publications or conferences
Contribute actively to defining research roadmaps, prioritizing experimental directions based on potential impact and feasibility

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

Founding LLM Evaluation Researcher

About The Position

Responsibilities

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company