CRAG-MM 2025 Challenge, in 2nd CRAG-MM Challenge: Improving RAG with Real-World Benchmarks colocated with the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 3-7 August 2025, Toronto, Canada
This technical report details D2KLab at EURECOM’s approach for the Comprehensive Retrieval Augmented Generated Multi-Modal Challenge (CRAG-MM) 2025 organized by Meta at KDD 2025. Our solution relies on a modular pipeline that integrates a Vision Language Model (VLM) and makes use of both image and web search APIs. Our solution tackles the three proposed subtasks mixing pipeline components that perform domain classification, entity extraction, image segmentation for refined image search, and web content re-ranking. Overall, our approach ranked 39th on the Truthfulness metric with a score of -0.081 for the multi-turn and multi-source Task 3, 43rd with a score of -0.176
for Task 2, and 52nd with a score of -0.205 for Task 1. We use less than half of the allocated 10 second budget per query. We release the source code of our approach for supporting reproducibility at https://gitlab.aicrowd.com/semere_wubshet/d2klab-meta-cragmm-2025.
Type:
Conference
City:
Toronto
Date:
2025-08-03
Department:
Communication systems
Eurecom Ref:
8383
See also: