From the semantic web to the synthetic web: Knowledge graphs as infrastructure for generative AI
DOI:
https://doi.org/10.3145/thinkepi.2025.e19a30Keywords:
Knowledge graphs, Generative AI, LLMs, Semantic web, Synthetic web, WikidataAbstract
The semantic web was born with the idea of publishing structured and interconnected data. Its adoption was limited by technical barriers, a lack of accessible tools and a weak culture of open data. Even so, projects like DBpedia and, above all, Wikidata demonstrated that knowledge graphs can support reliable and useful applications in specific domains. The emergence of generative AI based on LLMs has changed the landscape. These systems produce fluent responses, but also hallucinations and a lack of traceability. In addition, their knowledge is implicit and difficult to update. This has increased interest in integrating generative AI with knowledge graphs to provide precision, coherence and verification. For this reason, hybrid approaches combining LLMs with knowledge graphs are emerging: RAG techniques that query RDF graphs, triple generation, assistance in knowledge engineering, graph-constrained reasoning and training with structured data. In this context, Wikidata has become a key infrastructure for a “synthetic web”, where its capacity to represent structured data and its potential for factual verification make it possible to build more reliable generative AI systems.Downloads
References
Ang, Alan (2025). “Wikidata-Wikimedia’s knowledge graph in a world of generative AI”. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Wikidata-Wikimedia%27s_knowledge_graph_in_a_world_of_generative_AI_(2025).pdf
Cai, Linyue; Yu, Chaojia; Kang, Yongqi; Fu, Yu; Zhang, Heng; Zhao, Yong (2025). “Practices, opportunities and challenges in the fusion of knowledge graphs and large language models”. Frontiers in computer science, v. 7, 1590632. https://doi.org/10.3389/fcomp.2025.1590632
Dang, Minh-Hoanh; Pham, Thi-Hoang-Thi; Molli, Pascal; Skaf-Molli, Hala; Gaignard, Alban (2025). “LLM4Schema.org: Generating Schema.org markups with Large Language Models”. Semantic web: Interoperability, usability, applicability, v. 16, n. 6. https://doi.org/10.1177/22104968251382172
De-Santis, Antonio; Balduini, Marco; De-Santis, Federico; Proia, Andrea; Leo, Arsenio; Bambrilla, Marco; Della-Valle, Emanuele (2025). “Integrating Large Language Models and knowledge graphs for extraction and validation of textual test data”. In: Demartini, Gianluca; Hose, Katja; Acosta, Maribel; Palmonari, Matteo; Cheng, Gong; Skaf-Molli, Hala; Ferranti, Nicolas; Hernández, Daniel; Hogan, Aidan (eds.). The Semantic Web – ISWC 2024 (Lecture Notes in Computer Science, vol. 15233). Springer. https://doi.org/10.1007/978-3-031-77847-6_17
Garijo, Daniel; Poveda-Villalón, María; Amador-Domínguez, Elvira; Wang, ZiYuan; García-Castro, Raúl; Corcho, Oscar (2025). “LLMs for ontology engineering: A landscape of tasks and benchmarking challenges”. In: Alharbi, Reham; De-Berardinis, Jacopo; Groth, Paul; Meroño-Peñuela, Albert; Simperi, Elena; Tamma, Valentina (eds.). Proceedings of the special session on harmonising generative AI and semantic web technologies (HGAIS 2024) co-located with the 23rd International Semantic Web Conference (ISWC 2024). CEUR Workshop Proceedings, v. 3953. https://ceur-ws.org/Vol-3953/364.pdf
Höltgen, Lea; Zentgraf, Sven; Hagedorn, Philipp; König, Markus (2025). “Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models”. AI in civil engineering, v. 4, art. 14. https://doi.org/10.1007/s43503-025-00055-9
Johnson, Isaac; Kaffee, Lucie-Aimée; Redi, Miriam (2024). “Wikimedia data for AI: a review of Wikimedia datasets for NLP tasks and AI-assisted editing”. In: Kaffee, Lucie-Aimée, Fan, Angela; Gwadabe, Tajuddeen; Johnson, Isaac; Petroni, Fabio; Van-Strien, Daniel (eds.). Proceedings of the first workshop on advancing natural language processing for Wikipedia (pp. 91–101). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.wikinlp-1.14
Lubiana, Tiago; Rasberry, Lane; Mietchen, Daniel (2025). “The Wikidata query service split and its impact on the scholarly graph”. In: Chaves-Fraga, David; Heibi, Ivan; Garijo, Daniel; Collarana, Diego; Salatino, Angelo; Vahdati, Sahar (eds.). Proceedings of the SEMANTiCS Conference 2025, Vienna, Austria (03-05 Sept.). CEUR Workshop Proceedings, v. 4064. https://ceur-ws.org/Vol-4064/PD-paper3.pdf
Luo, Linhao; Zhao, Zicheng; Haffari, Gholamreza; Gong, Chen; Pan, Shirui (2024). “Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models”. arXiv preprint arXiv:2410.13080. https://doi.org/10.48550/arXiv.2410.13080
Moiseev, Fedor; Dong, Zhe; Alfonseca, Enrique; Jaggi, Martin (2022). “SKILL: Structured Knowledge Infusion for Large Language Models”. In: Carpuat, Marine; De-Marneffe, Marie-Catherine; Meza-Ruiz, Ivan-Vladimir (eds.). Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1581–1588). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.113
Pan, Jeff Z.; Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Chen, Jiaoyan; Dietze, Stefan; Jabeen, Hajira; Omeliyanenko, Janna; Zhang, Wen; Lissandrini, Matteo; Biswas, Russa; De-Melo, Gerard; Bonifati, Angela; Vakaj, Edlira; Dragoni, Mauro; Graux, Damien (2023). “Large Language Models and Knowledge Graphs: Opportunities and challenges”. Transactions on graph data and knowledge, v. 1, n. 1. https://doi.org/10.4230/TGDK.1.1.2
Saeedizade, Mohammad-Javad; Blomqvist, Eva (2024). “Navigating ontology development with Large Language Models”. In: Meroño-Peñuela, Albert; Dimou, Anastasia; Troncy, Raphaël; Hartig, Olaf; Acosta, Maribel; Alam, Mehwish; Paulheim, Heiko; Lisena, Pasquale (eds.). The Semantic Web. ESWC 2024 (Lecture Notes in Computer Science, v. 14664). Springer. https://doi.org/10.1007/978-3-031-60626-7_8
Sequeda, Juan; Allemang, Dean; Jacob, Bryon (2025). “Knowledge graphs as a source of trust for LLM-powered enterprise question answering”. Journal of Web Semantics, v. 85, 100858. https://doi.org/10.1016/j.websem.2024.100858
Zhang, Bohui; Reklos, Ioannis; Jain, Nitisha; Meroño-Peñuela, Albert; Simperl, Elena (2023). “Using Large Language Models for Knowledge Engineering (LLMKE): A case study on Wikidata”. In: Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Pan, Jeff Z. (eds.). Joint proceedings of the 1st workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) and the 2nd challenge on Language Models for Knowledge Base Construction (LM-KBC) (KBC-LM + LM-KBC 2023), Athens, Greece, November 6, 2023. CEUR Workshop Proceedings, Vol. 3577. https://ceur-ws.org/Vol-3577/paper8.pdf
Wikimedia Foundation (2024). “Wikidata: Embedding Project”. Wikidata. https://www.wikidata.org/wiki/Wikidata:Embedding_Project
Wikimedia Foundation (2025). “Wikidata: Wikibase GraphQL prototype”. Wikidata. https://www.wikidata.org/wiki/Wikidata:Wikibase_GraphQL_prototype
Xu, Silei; Liu, Shicheng; Culhane, Theo; Pertseva, Elizaveta; Wu, Meng-Hsi; Semnani, Sina; Lam, Monica (2023). “Fine-tuned LLMs know more, hallucinate less with few-shot sequence-to-sequence semantic paarsing over Wikidata. In: Bouamor, Houda; Pino, Juan; Bali, Kalika (eds.). Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 5778–5791). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.353
Downloads
Published
How to Cite
Dimensions