Text2KGBench-LettrIA: A refined benchmark for Text2Graph systems

Plu, Julien; Escobar, Oscar Moreno; Trouillez, Edouard; Gapin, Axelle; Lisena, Pasquale; Ehrhart, Thibault; Troncy, Raphaël
KBC-LM and LM-KBC Challenge (Knowledge Base Construction from Pre-trained Language Models) at ISWC 2025, 24th International Semantic Web Conference, 2-6 November 2025, Nara, Japan

Recent advances in Large Language Models (LLMs) have catalyzed significant research into automated knowledge graph (KG) construction from text, a fundamental challenge at the intersection of natural language processing and semantic web technologies. However, the reliability of evaluating model performance is hindered by limitations in existing benchmarks like Text2KGBench, which exhibit shortcomings in data quality, ontological consistency, and structural design. To address these issues, this paper introduces Text2KGBench-LettrIA, a substantially revised and curated benchmark derived from the DBpedia-WebNLG portion of Text2KGBench. Our primary contributions include: (1) the systematic refinement of 19 domain ontologies to enforce hierarchical structure and formal typing; (2) a complete re-annotation of 4,860 sentences, yielding over 14,000 high-fidelity triples under a strict set of annotation guidelines; and (3) the introduction of an enriched data format with enhanced metadata to ensure reproducibility and support multifaceted evaluation. We demonstrate the utility of our benchmark by evaluating a suite of both proprietary and open-weights LLMs in zero-shot and fine-tuned settings, respectively. Our results reveal a key finding: smaller, fine-tuned open-weights models can achieve superior F1 accuracy compared to their larger, proprietary counterparts, underscoring the critical role of high-quality, schema-aligned training data.


HAL
Type:
Conférence
City:
Nara
Date:
2025-11-02
Department:
Data Science
Eurecom Ref:
8395
Copyright:
CEUR

PERMALINK : https://www.eurecom.fr/publication/8395