Build and evolve the Real-Time Intelligence evaluations platform: implement offline and online eval pipelines, including golden datasets, human review workflows, and LLM-as-judge / auto-raters for agents, anomaly detectors, and decisioning systems. Instrument agentic solutions for observability by wiring up telemetry, tracing, structured logging, and dashboards so quality, safety, latency, and cost are easy to monitor and debug. Integrate evals into the development lifecycle by connecting pipelines to CI/CD, canary and A/B experiments, and phased rollouts, making it simple for partner teams to run and interpret evaluations. Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees