JudgeBench: A Benchmark for Evaluating LLM-based Judges

arXiv preprint 2024, 2025-01-19 00:00:00 -0800