Can we trust the judges? Validation of factuality evaluation methods via answer perturbation

PUBLICATION: Can we trust the judges? Validation of factuality evaluation methods via answer perturbation

© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in EvalLLM 2025, Workshop on Evaluation Generative Models and Challenges, colocated with TALN, 30 June 2025, Marseille, France and is available at :

Document