From the semantic web to the synthetic web: Knowledge graphs as infrastructure for generative AI

Authors

DOI:

https://doi.org/10.3145/thinkepi.2025.e19a30

Keywords:

Knowledge graphs, Generative AI, LLMs, Semantic web, Synthetic web, Wikidata

Abstract

The semantic web was born with the idea of publishing structured and interconnected data. Its adoption was limited by technical barriers, a lack of accessible tools and a weak culture of open data. Even so, projects like DBpedia and, above all, Wikidata demonstrated that knowledge graphs can support reliable and useful applications in specific domains. The emergence of generative AI based on LLMs has changed the landscape. These systems produce fluent responses, but also hallucinations and a lack of traceability. In addition, their knowledge is implicit and difficult to update. This has increased interest in integrating generative AI with knowledge graphs to provide precision, coherence and verification. For this reason, hybrid approaches combining LLMs with knowledge graphs are emerging: RAG techniques that query RDF graphs, triple generation, assistance in knowledge engineering, graph-constrained reasoning and training with structured data. In this context, Wikidata has become a key infrastructure for a “synthetic web”, where its capacity to represent structured data and its potential for factual verification make it possible to build more reliable generative AI systems.

Downloads

Download data is not yet available.

References

Ang, Alan (2025). “Wikidata-Wikimedia’s knowledge graph in a world of generative AI”. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Wikidata-Wikimedia%27s_knowledge_graph_in_a_world_of_generative_AI_(2025).pdf

Cai, Linyue; Yu, Chaojia; Kang, Yongqi; Fu, Yu; Zhang, Heng; Zhao, Yong (2025). “Practices, opportunities and challenges in the fusion of knowledge graphs and large language models”. Frontiers in computer science, v. 7, 1590632. https://doi.org/10.3389/fcomp.2025.1590632

Dang, Minh-Hoanh; Pham, Thi-Hoang-Thi; Molli, Pascal; Skaf-Molli, Hala; Gaignard, Alban (2025). “LLM4Schema.org: Generating Schema.org markups with Large Language Models”. Semantic web: Interoperability, usability, applicability, v. 16, n. 6. https://doi.org/10.1177/22104968251382172

De-Santis, Antonio; Balduini, Marco; De-Santis, Federico; Proia, Andrea; Leo, Arsenio; Bambrilla, Marco; Della-Valle, Emanuele (2025). “Integrating Large Language Models and knowledge graphs for extraction and validation of textual test data”. In: Demartini, Gianluca; Hose, Katja; Acosta, Maribel; Palmonari, Matteo; Cheng, Gong; Skaf-Molli, Hala; Ferranti, Nicolas; Hernández, Daniel; Hogan, Aidan (eds.). The Semantic Web – ISWC 2024 (Lecture Notes in Computer Science, vol. 15233). Springer. https://doi.org/10.1007/978-3-031-77847-6_17

Garijo, Daniel; Poveda-Villalón, María; Amador-Domínguez, Elvira; Wang, ZiYuan; García-Castro, Raúl; Corcho, Oscar (2025). “LLMs for ontology engineering: A landscape of tasks and benchmarking challenges”. In: Alharbi, Reham; De-Berardinis, Jacopo; Groth, Paul; Meroño-Peñuela, Albert; Simperi, Elena; Tamma, Valentina (eds.). Proceedings of the special session on harmonising generative AI and semantic web technologies (HGAIS 2024) co-located with the 23rd International Semantic Web Conference (ISWC 2024). CEUR Workshop Proceedings, v. 3953. https://ceur-ws.org/Vol-3953/364.pdf

Höltgen, Lea; Zentgraf, Sven; Hagedorn, Philipp; König, Markus (2025). “Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models”. AI in civil engineering, v. 4, art. 14. https://doi.org/10.1007/s43503-025-00055-9

Johnson, Isaac; Kaffee, Lucie-Aimée; Redi, Miriam (2024). “Wikimedia data for AI: a review of Wikimedia datasets for NLP tasks and AI-assisted editing”. In: Kaffee, Lucie-Aimée, Fan, Angela; Gwadabe, Tajuddeen; Johnson, Isaac; Petroni, Fabio; Van-Strien, Daniel (eds.). Proceedings of the first workshop on advancing natural language processing for Wikipedia (pp. 91–101). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.wikinlp-1.14

Lubiana, Tiago; Rasberry, Lane; Mietchen, Daniel (2025). “The Wikidata query service split and its impact on the scholarly graph”. In: Chaves-Fraga, David; Heibi, Ivan; Garijo, Daniel; Collarana, Diego; Salatino, Angelo; Vahdati, Sahar (eds.). Proceedings of the SEMANTiCS Conference 2025, Vienna, Austria (03-05 Sept.). CEUR Workshop Proceedings, v. 4064. https://ceur-ws.org/Vol-4064/PD-paper3.pdf

Luo, Linhao; Zhao, Zicheng; Haffari, Gholamreza; Gong, Chen; Pan, Shirui (2024). “Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models”. arXiv preprint arXiv:2410.13080. https://doi.org/10.48550/arXiv.2410.13080

Moiseev, Fedor; Dong, Zhe; Alfonseca, Enrique; Jaggi, Martin (2022). “SKILL: Structured Knowledge Infusion for Large Language Models”. In: Carpuat, Marine; De-Marneffe, Marie-Catherine; Meza-Ruiz, Ivan-Vladimir (eds.). Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1581–1588). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.113

Pan, Jeff Z.; Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Chen, Jiaoyan; Dietze, Stefan; Jabeen, Hajira; Omeliyanenko, Janna; Zhang, Wen; Lissandrini, Matteo; Biswas, Russa; De-Melo, Gerard; Bonifati, Angela; Vakaj, Edlira; Dragoni, Mauro; Graux, Damien (2023). “Large Language Models and Knowledge Graphs: Opportunities and challenges”. Transactions on graph data and knowledge, v. 1, n. 1. https://doi.org/10.4230/TGDK.1.1.2

Saeedizade, Mohammad-Javad; Blomqvist, Eva (2024). “Navigating ontology development with Large Language Models”. In: Meroño-Peñuela, Albert; Dimou, Anastasia; Troncy, Raphaël; Hartig, Olaf; Acosta, Maribel; Alam, Mehwish; Paulheim, Heiko; Lisena, Pasquale (eds.). The Semantic Web. ESWC 2024 (Lecture Notes in Computer Science, v. 14664). Springer. https://doi.org/10.1007/978-3-031-60626-7_8

Sequeda, Juan; Allemang, Dean; Jacob, Bryon (2025). “Knowledge graphs as a source of trust for LLM-powered enterprise question answering”. Journal of Web Semantics, v. 85, 100858. https://doi.org/10.1016/j.websem.2024.100858

Zhang, Bohui; Reklos, Ioannis; Jain, Nitisha; Meroño-Peñuela, Albert; Simperl, Elena (2023). “Using Large Language Models for Knowledge Engineering (LLMKE): A case study on Wikidata”. In: Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Pan, Jeff Z. (eds.). Joint proceedings of the 1st workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) and the 2nd challenge on Language Models for Knowledge Base Construction (LM-KBC) (KBC-LM + LM-KBC 2023), Athens, Greece, November 6, 2023. CEUR Workshop Proceedings, Vol. 3577. https://ceur-ws.org/Vol-3577/paper8.pdf

Wikimedia Foundation (2024). “Wikidata: Embedding Project”. Wikidata. https://www.wikidata.org/wiki/Wikidata:Embedding_Project

Wikimedia Foundation (2025). “Wikidata: Wikibase GraphQL prototype”. Wikidata. https://www.wikidata.org/wiki/Wikidata:Wikibase_GraphQL_prototype

Xu, Silei; Liu, Shicheng; Culhane, Theo; Pertseva, Elizaveta; Wu, Meng-Hsi; Semnani, Sina; Lam, Monica (2023). “Fine-tuned LLMs know more, hallucinate less with few-shot sequence-to-sequence semantic paarsing over Wikidata. In: Bouamor, Houda; Pino, Juan; Bali, Kalika (eds.). Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 5778–5791). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.353

Published

2025-12-12

How to Cite

Pástor-Sánchez, J.-A. (2025). From the semantic web to the synthetic web: Knowledge graphs as infrastructure for generative AI. Anuario ThinkEPI, 19. https://doi.org/10.3145/thinkepi.2025.e19a30

Dimensions

Altmetrics

Issue

Section

Tecnologí­as de la información y la comunicación