At Sema4.ai, we’re building an Enterprise AI Agent platform that fundamentally changes how knowledge work gets done—by enabling people and AI agents to collaborate in durable, trustworthy ways. As a Staff Engineer, AI Evals, you’ll design and own the evaluation systems that determine whether our agents are actually good: correct, reliable, efficient, and improving over time. You’ll build the measurement backbone that guides model choice, agent design, product decisions, and customer trust. This is an early, high-impact role. You’ll be defining how we measure success for AI agents in production, where ambiguity is real, and ground truth can be messy. We’re looking for an engineer who brings rigor, judgment, and strong opinions about what “good” looks like, and who know how to operationalize it.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed