Recent from talks
Nothing was collected or created yet.
Wikidata
View on WikipediaThis article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation.[2] It is a common source of open data that Wikimedia projects such as Wikipedia,[3][4] and anyone else, are able to use under the CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, including its extension for semi-structured data, the Wikibase. As of early 2025, Wikidata had 1.65 billion item statements (semantic triples).[5]
Key Information
Concept
[edit]
Wikidata is a document-oriented database, focusing on items, which represent any kind of topic, concept, or object. Each item is allocated a unique persistent identifier called its QID, a positive integer prefixed with the upper-case letter "Q"[a]. This makes it possible to provide translations of the basic information describing the topic each item covers without favouring any particular language.
Some examples of items and their QIDs are 1988 Summer Olympics (Q8470), love (Q316), Johnny Cash (Q42775), Elvis Presley (Q303), and Gorilla (Q36611).
Item labels do not need to be unique. For example, there are two items named "Elvis Presley": Elvis Presley (Q303), which represents the American singer and actor, and Elvis Presley (Q610926), which represents his self-titled album. However, the combination of a label and its description must be unique. To avoid ambiguity, an item's QID is hence linked to this combination.
Main parts
[edit]
A layout of the four main components of a phase-1 Wikidata page: the label, description, aliases, and interlanguage links
Fundamentally, an item consists of:
- An identifier (the QID), related to a label and a description.
- Optionally, multiple aliases and some number of statements (and their properties and values).
Statements
[edit]
Statements are how any information known about an item is recorded in Wikidata. Formally, they consist of key–value pairs, which match a property (such as "author", or "publication date") with one or more entity values (such as "Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property color (P462) with the value white (Q23444) under the item milk (Q8495).
Statements may map a property to more than one value. For example, the "occupation" property for Marie Curie could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations.[7]
Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property official website (P856) may only be paired with values of type "URL".[8]
Optionally, qualifiers can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "point in time (P585): 2011" (as its own key-value pair). Values in the statements may also be annotated with references, pointing to a source backing up the statement's content.[9] As with statements, all qualifiers and references are property–value pairs.
Properties
[edit]
Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as subproperty of (P1647).
Properties may also define more complex rules about their intended usage, termed constraints. For example, the capital (P36) property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules.[10]
Before a new property is created, it needs to undergo a discussion process.[11][12]
The most used property is cites work (P2860), which is used on more than 290,000,000 item pages as of November 2023.[update][13]
Lexemes
[edit]
In linguistics, a lexeme is a unit of lexical meaning representing a group of words that share the same core meaning and grammatical characteristics.[14][15] Similarly, Wikidata's lexemes are items with a structure that makes them more suitable to store lexicographical data. Since 2016, Wikidata has supported lexicographical entries in the form of lexemes.[16]
In Wikidata, lexicographical entries have a different identifier from regular item entries. These entries are prefixed with the letter L, such as in the example entries for book and cow. Lexicographical entries in Wikidata can contain statements, senses, and forms.[17] The use of lexicographical entries in Wikidata allows for the documentation of word usage, the connection between words and items on Wikidata, word translations, and enables machine-readable lexicographical data.
In 2020, lexicographical entries on Wikidata exceeded 250,000. The language with the most lexicographical entries was Russian, with a total of 101,137 lexemes, followed by English with 38,122 lexemes. There are over 668 languages with lexicographical entries on Wikidata.[18]
Entity schemas
[edit]
In Wikidata, a schema is a data model that outlines the necessary attributes for a data item.[19][20] For instance, a data item that uses the attribute "instance of" with the value "human" would typically include attributes such as "place of birth," "date of birth," "date of death," and "place of death."[21] The entity schema in Wikidata utilizes Shape Expression (ShEx) to describe the data in Wikidata items in the form of a Resource Description Framework (RDF).[22] The use of entity schemas in Wikidata helps address data inconsistencies and unchecked vandalism.[19]
In January 2019, development started of a new extension for MediaWiki to enable storing ShEx in a separate namespace.[23][24] Entity schemas are stored with different identifiers than those used for items, properties, and lexemes. Entity schemas are stored with an "E" identifier, such as E10 for the entity schema of human data instances and E270 for the entity schema of building data instances. This extension has since been installed on Wikidata[25] and enables contributors to use ShEx for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an entity schema,[clarification needed] and this makes it an important tool for quality assurance.
Content
[edit]
Wikidata's content collections include data for biographies,[26] medicine,[27] digital humanities,[28] scholarly metadata through the WikiCite project.[29]
It includes data collections from other open projects including Freebase.[30]
Development
[edit]The creation of the project was funded by donations from the Allen Institute for AI, the Gordon and Betty Moore Foundation, and Google, Inc., totaling €1.3 million.[31][32] The development of the project is mainly driven by Wikimedia Deutschland under the management of Lydia Pintscher, and was originally split into three phases:[33]
- Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages.
- Providing a central place for infobox data for all Wikipedias.
- Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including Meta-Wiki and the own Wikidata (interwikilinks).
Initial rollout
[edit]Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006.[3][34][35] At this time,[when?] only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links).
Historically, a Wikipedia article would include a list of interlanguage links (links to articles on the same topic in other editions of Wikipedia, if they existed). Wikidata was originally a self-contained repository of interlanguage links.[36] Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links.[citation needed]
On 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata.[37] This functionality was extended to the Hebrew and Italian Wikipedias on 30 January, to the English Wikipedia on 13 February and to all other Wikipedias on 6 March.[38][39][40][41] After no consensus was reached over a proposal to restrict the removal of language links from the English Wikipedia,[42] they were automatically removed by bots. On 23 September 2013, interlanguage links went live on Wikimedia Commons.[43]
Statements and data access
[edit]On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more data types (such as coordinates and dates) to follow later. The first new type, string, was deployed on 6 March.[44]
The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013.[45][46] On 16 September 2015, Wikidata began allowing so-called arbitrary access, or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before.[47] On 27 April 2016, arbitrary access was activated on Wikimedia Commons.[48]
According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by Internet bots, which helps to "break down the walls" of data silos.[49]
Query service and other improvements
[edit]On 7 September 2015, the Wikimedia Foundation announced the release of the Wikidata Query Service,[50] which lets users run queries on the data contained in Wikidata.[51] The service uses SPARQL as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways.[52] It uses Blazegraph as its triplestore and graph database.[53][54]
In 2021, Wikimedia Deutschland released the Query Builder,[55] "a form-based query builder to allow people who don't know how to use SPARQL" to write a query.
The Wikidata Embedding Project was made available in October 2025. It provides a vector-based semantic search tool, allowing plain-language queries, and supports the Model Context Protocol standard that makes the data more readily available to AI systems.[56] The project is a partnership between Wikimedia Deutschland, Jina.AI and DataStax, an IBM subsidiary.[56]
Logo
[edit]The bars on the logo contain the word "WIKI" encoded in Morse code.[57] It was created by Arun Ganesh and selected through community decision.[58]
Reception
[edit]In November 2014, Wikidata received the Open Data Publisher Award from the Open Data Institute "for sheer scale, and built-in openness".[59]
In December 2014, Google announced that it would shut down Freebase in favor of Wikidata.[60]
As of November 2018[update][needs update], Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all Wikipedias' pages, 93% of all Wikivoyage articles, 34% of all Wikiquotes', 32% of all Wikisources', and 27% of Wikimedia Commons.[61]
As of December 2020[update], Wikidata's data was visualized by at least 20 other external tools[62] and over 300 papers have been published about Wikidata.[63]
In 2025, Wikidata was recognised as a "digital public good" by the Digital Public Goods Alliance.[64]
Applications
[edit]- Wikidata's structured dataset has been used by virtual assistants such as Apple's Siri and Amazon Alexa.[65][66]
- Mwnci extension can import data from Wikidata to LibreOffice Calc spreadsheets[67]
- KDE Itinerary – a privacy conscious open source travel assistant that uses data from Wikidata[68]
- Google originally started a frame semantic parser project that aims to parse the information on Wikipedia and transfer it into Wikidata by coming up with relevant statements using artificial intelligence.[69]
- MathQA – a mathematical question answering system[70]
- As of August 2025, Wikidata has been described as the world’s largest open-access knowledge graph.[71]
A systematic literature review of the uses of Wikidata in research was carried out in 2019.[72]
See also
[edit]Notes
[edit]- ^ Q is the first initial of Qamarniso Vrandečić (née Ismoilova), an Uzbek Wikimedian married to Wikidata co-developer Denny Vrandečić.[6]
References
[edit]- ^ "Wikidata's tenth anniversary has been celebrated in Tamale, Ghana, by the Dagbani Wikimedians User Group and two of its sister communities". 18 November 2022. Retrieved 4 October 2024.
Wikidata went live on October 29, 2012
- ^ Chalabi, Mona (26 April 2013). "Welcome to Wikidata! Now what?". Archived from the original on 2 October 2021. Retrieved 2 October 2021.
- ^ a b Wikidata (Archived 29 October 2012 at the Wayback Machine)
- ^ "Data Revolution for Wikipedia". Wikimedia Deutschland. 30 March 2012. Archived from the original on 23 October 2012. Retrieved 11 September 2012.
- ^ "Grafana". grafana.wikimedia.org. Retrieved 21 March 2024.
- ^ Vrandečić, Denny; Pintscher, Lydia; Krötzsch, Markus (30 April 2023). "Wikidata: The Making of". Companion Proceedings of the ACM Web Conference 2023. pp. 615–624. doi:10.1145/3543873.3585579. ISBN 9781450394192. S2CID 258377705.
- ^ "Help:Statements – Wikidata". www.wikidata.org. Archived from the original on 25 March 2019. Retrieved 20 February 2019.
- ^ "Help:Data type – Wikidata". www.wikidata.org. Archived from the original on 23 March 2019. Retrieved 20 February 2019.
- ^ "Help:Sources – Wikidata". www.wikidata.org. Archived from the original on 17 April 2019. Retrieved 20 February 2019.
- ^ "Help:Property constraints portal". Wikidata. Archived from the original on 1 June 2019. Retrieved 20 February 2019.
- ^ Cochrane, Euan (30 September 2016). "Wikidata as a digital preservation knowledgebase". openpreservation.org. Archived from the original on 5 January 2022. Retrieved 5 January 2022.
- ^ Samuel, John (15 August 2018). "Experimental IR Meets Multilinguality, Multimodality, and Interaction". Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2018. Lecture Notes in Computer Science. Vol. 11018. p. 129. doi:10.1007/978-3-319-98932-7_12. ISBN 978-3-319-98931-0.
- ^ "Wikidata:Database reports/List of properties/Top100". Archived from the original on 24 February 2023. Retrieved 18 November 2023.
- ^ Andreou, Marios (27 March 2019), "Lexemes", Linguistics, Oxford University Press, doi:10.1093/obo/9780199772810-0232, ISBN 978-0-19-977281-0, retrieved 17 August 2024
- ^ Bonami, Olivier; Boyé, Gilles; Dal, Georgette; Giraudo, Hélène; Namer, Fiammetta (23 August 2018). The Lexeme In Descriptive And Theoretical Morphology. Language Science Press. doi:10.5281/zenodo.1402520.
- ^ Nielsen, Finn Årup (2019), Hitzler, Pascal; Kirrane, Sabrina; Hartig, Olaf; de Boer, Victor (eds.), "Ordia: A Web Application for Wikidata Lexemes", The Semantic Web: ESWC 2019 Satellite Events, Lecture Notes in Computer Science, vol. 11762, Cham: Springer International Publishing, pp. 141–146, doi:10.1007/978-3-030-32327-1_28, ISBN 978-3-030-32326-4, retrieved 17 August 2024
- ^ "Wikidata:Lexicographical data/Documentation – Wikidata". www.wikidata.org. Archived from the original on 13 November 2018. Retrieved 13 November 2018.
- ^ Nielsen, Finn (May 2020) [2020-05]. Ionov, Maxim; McCrae, John P.; Chiarcos, Christian; Declerck, Thierry; Bosque-Gil, Julia; Gracia, Jorge (eds.). "Lexemes in Wikidata: 2020 status". Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020). Marseille, France: European Language Resources Association: 82–86. ISBN 979-10-95546-36-8.
- ^ a b Werkmeister, Lucas (2018). Schema Inference of Wikidata (PDF). Karlsruhe: Fakultät für Informatik, Karlsruhe Institute of Technology.
- ^ Hernández, Daniel; Hogan, Aidan; Krötzsch, M. (2015). "Reifying RDF: What Works Well With Wikidata?".
- ^ Erxleben, Fredo; Günther, Michael; Krötzsch, Markus; Mendez, Julian; Vrandečić, Denny (2014), "Introducing Wikidata to the Linked Data Web", Lecture Notes in Computer Science, Cham: Springer International Publishing, pp. 50–65, doi:10.1007/978-3-319-11964-9_4, ISBN 978-3-319-11963-2, retrieved 18 August 2024
- ^ Thornton, Katherine; Solbrig, Harold; Stupp, Gregory S.; Labra Gayo, Jose Emilio; Mietchen, Daniel; Prud’hommeaux, Eric; Waagmeester, Andra (2019), Hitzler, Pascal; Fernández, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali (eds.), "Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation", The Semantic Web, vol. 11503, Cham: Springer International Publishing, pp. 606–620, doi:10.1007/978-3-030-21348-0_39, hdl:10651/53418, ISBN 978-3-030-21347-3
- ^ "Extension:EntitySchema – MediaWiki". mediawiki.org. Archived from the original on 25 June 2021. Retrieved 10 September 2021.
- ^ "Initial empty repository". Gerrit. 15 January 2019. Archived from the original on 19 March 2022. Retrieved 12 June 2022.
- ^ "Version – Wikidata". Wikidata.org. Archived from the original on 19 October 2021. Retrieved 10 September 2021.
- ^ Chisholm, Andrew; Radford, Will; Hachey, Ben (April 2017). Lapata, Mirella; Blunsom, Phil; Koller, Alexander (eds.). "Learning to generate one-sentence biographies from Wikidata". Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Valencia, Spain: Association for Computational Linguistics: 633–642. arXiv:1702.06235.
- ^ Turki, Houcemeddine; Shafee, Thomas; Hadj Taieb, Mohamed Ali; Ben Aouicha, Mohamed; Vrandečić, Denny; Das, Diptanshu; Hamdi, Helmi (November 2019). "Wikidata: A large-scale collaborative ontological medical database". Journal of Biomedical Informatics. 99 103292. doi:10.1016/j.jbi.2019.103292. PMID 31557529.
- ^ Zhao, Fudie (31 May 2023). "A systematic review of Wikidata in Digital Humanities projects". Digital Scholarship in the Humanities. 38 (2): 852–874. doi:10.1093/llc/fqac083.
- ^ Nielsen, Finn Årup; Mietchen, Daniel; Willighagen, Egon (2017). "Scholia, Scientometrics and Wikidata" (PDF). The Semantic Web: ESWC 2017 Satellite Events. Lecture Notes in Computer Science. Vol. 10577. pp. 237–259. doi:10.1007/978-3-319-70407-4_36. ISBN 978-3-319-70406-7.
- ^ Pellissier Tanon, Thomas; Vrandečić, Denny; Schaffert, Sebastian; Steiner, Thomas; Pintscher, Lydia (11 April 2016). "From Freebase to Wikidata: The Great Migration". Proceedings of the 25th International Conference on World Wide Web. WWW '16. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee. pp. 1419–1428. doi:10.1145/2872427.2874809. ISBN 978-1-4503-4143-1.
- ^ Dickinson, Boonsri (30 March 2012). "Paul Allen Invests In A Massive Project To Make Wikipedia Better". Business Insider. Archived from the original on 23 December 2017. Retrieved 11 September 2012.
- ^ Perez, Sarah (30 March 2012). "Wikipedia's Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others". TechCrunch. Archived from the original on 5 October 2012. Retrieved 11 September 2012.
- ^ "Wikidata – Meta". meta.wikimedia.org. Archived from the original on 7 April 2012. Retrieved 8 November 2015.
- ^ Pintscher, Lydia (30 October 2012). "wikidata.org is live (with some caveats)". wikidata-l (Mailing list). Retrieved 3 November 2012.
- ^ Roth, Matthew (30 March 2012). "The Wikipedia data revolution". Wikimedia Foundation. Archived from the original on 31 July 2020. Retrieved 11 September 2012.
- ^ Leitch, Thomas (1 November 2014). Wikipedia U: Knowledge, Authority, and Liberal Education in the Digital Age. Johns Hopkins University Press. p. 120. ISBN 978-1-4214-1550-5.
- ^ Pintscher, Lydia (14 January 2013). "First steps of Wikidata in the Hungarian Wikipedia". Wikimedia Deutschland. Archived from the original on 14 December 2015. Retrieved 17 December 2015.
- ^ Pintscher, Lydia (30 January 2013). "Wikidata coming to the next two Wikipedias". Wikimedia Deutschland. Archived from the original on 4 October 2018. Retrieved 31 January 2013.
- ^ Pintscher, Lydia (13 February 2013). "Wikidata live on the English Wikipedia". Wikimedia Deutschland. Archived from the original on 19 February 2013. Retrieved 15 February 2013.
- ^ Pintscher, Lydia (6 March 2013). "Wikidata now live on all Wikipedias". Wikimedia Deutschland. Archived from the original on 14 April 2013. Retrieved 8 March 2013.
- ^ "Wikidata ist für alle Wikipedien da" (in German). Golem.de. Archived from the original on 6 November 2018. Retrieved 29 January 2014.
- ^ "Wikipedia talk:Wikidata interwiki RFC". 29 March 2013. Archived from the original on 18 October 2021. Retrieved 30 March 2013.
- ^ Pintscher, Lydia (23 September 2013). "Wikidata is Here!". Commons:Village pump. Archived from the original on 6 December 2021. Retrieved 30 August 2016.
- ^ Pintscher, Lydia. "Wikidata/Status updates/2013 03 01". Wikimedia Meta-Wiki. Wikimedia Foundation. Archived from the original on 12 April 2013. Retrieved 3 March 2013.
- ^ Pintscher, Lydia (27 March 2013). "You can have all the data!". Wikimedia Deutschland. Archived from the original on 29 March 2013. Retrieved 28 March 2013.
- ^ "Wikidata goes live worldwide". The H. 25 April 2013. Archived from the original on 1 January 2014.
- ^ Pintscher, Lydia (16 September 2015). "Wikidata: Access to data from arbitrary items is here". Wikipedia:Village pump (technical). Archived from the original on 27 September 2016. Retrieved 30 August 2016.
- ^ Pintscher, Lydia (27 April 2016). "Wikidata support: arbitrary access is here". Commons:Village pump. Archived from the original on 5 February 2017. Retrieved 30 August 2016.
- ^ Waagmeester, Andra; Stupp, Gregory; Burgstaller-Muehlbacher, Sebastian; et al. (17 March 2020). "Wikidata as a knowledge graph for the life sciences". eLife. 9. doi:10.7554/ELIFE.52614. ISSN 2050-084X. PMC 7077981. PMID 32180547. Wikidata Q87830400.
- ^ "Home". query.wikidata.org. Archived from the original on 7 November 2016. Retrieved 30 January 2019.
- ^ "[Wikidata] Announcing the release of the Wikidata Query Service - Wikidata - lists.wikimedia.org". Archived from the original on 10 November 2015. Retrieved 13 November 2018.
- ^ "Wikidata:Tools/Query data – Wikidata". www.wikidata.org. Archived from the original on 31 May 2020. Retrieved 13 November 2018.
- ^ "[Wikidata-tech] Wikidata Query Backend Update (take two!)". lists.wikimedia.org. Archived from the original on 6 January 2021. Retrieved 29 August 2018. (The message also contains a link to the graph databases comparison performed by Wikimedia.)
- ^ 86 on GitHub
- ^ "Wikidata Query Builder". query.wikidata.org.
- ^ a b Brandom, Russell (1 October 2025). "New project makes Wikipedia data more accessible to AI". TechCrunch. Retrieved 1 October 2025.
- ^ commons:File talk:Wikidata-logo-en.svg#Hybrid. Retrieved 2016-10-06.
- ^ "Und der Gewinner ist..." 13 July 2012. Archived from the original on 21 January 2021. Retrieved 16 June 2020.
- ^ "First ODI Open Data Awards presented by Sirs Tim Berners-Lee and Nigel Shadbolt". Open Data Institute. Archived from the original on 24 March 2016.
- ^ "Freebase". Google Plus. 16 December 2014. Archived from the original on 20 March 2019.
- ^ "Percentage of articles making use of data from Wikidata". Archived from the original on 15 November 2018. Retrieved 15 November 2018.
- ^ "Wikidata:Tools/Visualize data – Wikidata". www.wikidata.org. Archived from the original on 15 November 2018. Retrieved 15 November 2018.
- ^ "Scholia". Scholia. Archived from the original on 30 September 2021. Retrieved 2 August 2021.
- ^ "Wikidata". Digital Public Goods. Retrieved 29 October 2025.
- ^ Simonite, Tom (18 February 2019). "Inside the Alexa-Friendly World of Wikidata". Wired. ISSN 1059-1028. Retrieved 25 December 2020.
- ^ Merhav, Yuval; Ash, Steve (8 August 2018). "Automatic transliteration can help Alexa find data across language barriers". Amazon Science. Retrieved 3 February 2025.
- ^ "Rob Barry / Mwnci – Deep Spreadsheets". GitLab. Archived from the original on 21 September 2019. Retrieved 21 September 2019.
- ^ Krause, Volker (12 January 2020), KDE Itinerary – A privacy by design travel assistant, archived from the original on 26 June 2020, retrieved 10 November 2020
- ^ sling on GitHub
- ^ Scharpf, P. Schubotz, M. Gipp, B. Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling Archived 10 February 2023 at the Wayback Machine ACM/IEEE Joint Conference on Digital Libraries, 2022.
- ^ Caplan, Jeremy (25 August 2025). "Big Tech locks data away. Wikidata gives it back to the internet". Fast Company.
- ^ Mora-Cantallops, Marçal; Sánchez-Alonso, Salvador; García-Barriocanal, Elena (2 September 2019). "A systematic literature review on Wikidata". Data Technologies and Applications. 53 (3): 250–268. doi:10.1108/DTA-12-2018-0110. S2CID 202036639.
Further reading
[edit]- Mark Graham (6 April 2012), "The Problem With Wikidata", The Atlantic, US
- Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, Markus Luczak-Rösch: Peer-production system or collaborative ontology development effort: What is Wikidata? In, OpenSym 2015 – Conference on Open Collaboration, San Francisco, US, 19 – 21 Aug 2015 (preprint).
External links
[edit]- Official website

- Videos: WikidataCon on media.ccc.de
- Wikidata Query Builder
Wikidata
View on GrokipediaHistory
Inception and Early Development
In 2011, Wikimedia Deutschland proposed the creation of Wikidata as a central repository to address key challenges in Wikipedia maintenance, particularly the decentralized management of interlanguage links and the repetitive updating of infobox content across multiple language editions.[7] This initiative aimed to centralize structured data, reducing duplication and errors that arose from editors manually synchronizing links and facts in over 280 language versions of Wikipedia at the time.[7] The proposal outlined a phased approach, starting with interlanguage links to streamline navigation between articles on the same topic in different languages, thereby easing the burden on volunteer editors, especially in smaller Wikipedias.[7] Development of Wikidata officially began on April 1, 2012, in Berlin, under the leadership of Denny Vrandečić and Markus Krötzsch, who had earlier explored semantic enhancements for Wikipedia.[8] The project was initiated by Wikimedia Deutschland, with initial funding secured from Google, the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and support from the Wikimedia Foundation, totaling approximately €1.3 million for the early stages.[9] During these initial months, the team focused on designing the core data model, which relied on simple property-value pairs to represent entities—such as linking a city to its population or coordinates—allowing for flexible, machine-readable storage without rigid schemas.[10] The beta version of Wikidata launched on October 29, 2012, initially restricting editing to the creation of items and their connections via interlanguage links to Wikipedia articles, marking the project's first operational phase.[11] This limited scope enabled early testing of the centralized linking system while laying the groundwork for broader structured data integration in subsequent phases.[10]Key Milestones and Rollouts
Wikidata's development was structured around three primary phases, each building on the previous to expand its functionality and integration with Wikimedia projects. Phase 1, from 2012 to 2013, established the foundational infrastructure for centralizing interlanguage links, replacing the fragmented system where each Wikipedia maintained its own links to other language versions. The project launched in beta on October 29, 2012, initially permitting users to create items—unique identifiers for concepts—and add sitelinks connecting them to corresponding articles across Wikimedia sites.[10] Pilot testing began in January 2013 with the Hungarian, English, and French Wikipedias, and by March 6, 2013, interlanguage links were enabled across all Wikipedias, streamlining maintenance and improving multilingual navigation.[10][12] Phase 2 in 2013 introduced the core data model, enabling Wikidata to store structured facts beyond mere linking. Statements, consisting of a property-value pair with optional qualifiers and references, were first added on February 4, 2013, initially supporting limited data types such as items and Wikimedia Commons media files. Properties, which define the types of relationships (e.g., "instance of" or "country"), and sitelinks were integrated as essential components, allowing items to represent real-world entities with verifiable claims. Editing of statements was opened to the public on February 4, 2013, marking the transition to full community-driven content creation and significantly increasing contributions.[10] Phase 3, beginning in 2014, focused on practical applications and broader interoperability, including the integration of Wikidata data into Wikipedia infoboxes and the central coordination of external identifiers. In July 2014, the English Wikipedia began widespread adoption of Lua modules to pull data from Wikidata into infobox templates, automating the display of structured information like birth dates or occupations while reducing redundancy across articles. This integration relied on properties dedicated to external identifiers (e.g., ISBN or GeoNames ID), positioning Wikidata as a hub for linking to external databases and enhancing data reuse beyond Wikimedia.[10] Subsequent milestones extended Wikidata's scope to specialized data types. In May 2018, lexemes were introduced to support linguistic data, allowing the storage of words, forms, senses, and etymologies in multiple languages, thereby complementing Wiktionary and enabling queries on lexical relationships.[13] In 2019, entity schemas were launched using the Shape Expressions (ShEx) language to define and validate data models, helping enforce constraints on item structures and improve data quality through community-defined templates.[14] These advancements solidified Wikidata's role as a versatile, multilingual knowledge graph up to 2023.Recent Advancements (2024–2025)
In 2025, the Wikidata community engaged in a comprehensive survey conducted in July 2024, with results released on October 13, 2025, revealing key insights into contributors' backgrounds and involvement patterns. The report highlighted that editing data remains the most common activity, while priorities such as research with Wikidata and building applications are growing, informing future developments including enhanced data quality frameworks to support more reliable and reusable structured information.[15] Technical enhancements continued with the launch of the "Search by entity type" feature on June 19, 2025, which introduced typeahead compatibility in the Wikidata search box and allowed users to filter results specifically to items, properties, or lexemes. This update significantly improved navigation for users seeking particular data classes, streamlining access to the database's diverse entity types.[1] Accessibility efforts advanced through the introduction of a mobile editing prototype on June 12, 2025, enabling statement editing on items directly from mobile devices—a long-requested capability. Community feedback was actively solicited via video demonstrations and discussions, aiming to refine the tool for broader usability and inclusivity among editors on the go.[1] The WikidataCon 2025 conference, held online from October 31 to November 2, 2025, and organized by Wikimedia Deutschland, gathered developers, editors, and organizations to explore advancements in linked open data, with a strong emphasis on AI integrations and collaborative tools for enhanced data connectivity.[16] Wikimedia Deutschland's 2025 plan, outlined in February 2025 and aligned with strategic goals through 2030, prioritized scalability for Wikidata as part of broader linked open data infrastructure, targeting a doubling of technology contributors by 2030 to handle expanding data volumes and global participation. The plan also supported machine editing initiatives by improving editing experiences and productivity tools, facilitating automated contributions while maintaining community oversight.[17] On October 29, 2025, Wikidata received official recognition as a digital public good from the Digital Public Goods Alliance, affirming its role as an openly licensed, collaborative knowledge base with over 1.6 billion facts that promotes equitable access to information worldwide—the second Wikimedia project after Wikipedia to earn this distinction.[18]Core Concepts
Items and Identifiers
Items are the primary entities in Wikidata, representing real-world topics, concepts, or objects such as people, places, events, or abstract ideas.[19] Each item serves as a unique container for structured data about its subject, enabling the centralized storage and reuse of information across Wikimedia projects without redundancy.[20] For instance, the item for Douglas Adams is identified as Q42, which encapsulates all relevant data about the author in one place.[20] Every item is assigned a unique identifier known as a Q-ID, consisting of the letter "Q" followed by a sequential numeric code, such as Q42 or Q7186 for Marie Curie.[19] These Q-IDs ensure global uniqueness within Wikidata, preventing duplication by providing a single reference point for each entity regardless of language or project.[20] As of November 2025, Wikidata contains over 119 million items, forming the foundational scale of its knowledge base.[1] The structure of an item includes monolingual labels, which provide the primary name in a specific language (e.g., "Douglas Adams" in English for Q42); descriptions, offering a brief disambiguating summary (e.g., "English writer and humorist (1952–2001)"); and aliases, listing alternative names or variants (e.g., "DNA" as an alias for Q42).[19] Additionally, sitelinks connect the item to corresponding pages on other Wikimedia sites, such as linking Q42 to the English Wikipedia article on Douglas Adams, facilitating seamless cross-project navigation and data synchronization.[20] This structure allows Q-IDs to enable efficient linking across projects; for example, a single item like Q8470 for the 1988 Summer Olympics can be referenced uniformly in multiple Wikipedias, avoiding the need for separate, inconsistent entries.[19]Properties
Properties in Wikidata are reusable attributes that function as unique descriptors to define relationships and values for items, forming the predicates in the knowledge graph's triples. Each property has a dedicated page on Wikidata and is identified by a unique alphanumeric label consisting of the prefix "P" followed by a sequential number, referred to as a P-ID; for example, P31 denotes "instance of," which classifies an item as belonging to a specific class or category.[21] The creation of properties follows a structured, community-governed process to maintain relevance and avoid redundancy. Proposals for new properties are submitted to the dedicated Property proposal forum, where editors discuss their necessity, proposed datatype, and constraints; approval requires consensus or sufficient support from the community. Upon approval, the property is created by users with appropriate permissions, such as property creators or administrators, and assigned the next available sequential numeric ID starting from P1. This process ensures that properties are introduced only when they address a clear need in describing entities.[21] Properties are categorized by their datatype, which dictates the structure and validation of values they accept, enabling diverse representations of information. Common datatypes include those for geographical coordinates (e.g., P625, used for location data), dates and times (e.g., P569 for date of birth or death), and external identifiers that link to external databases (e.g., P345 for IMDb person ID or P214 for VIAF ID). These types support interoperability with other linked data systems. The most extensively used property is "cites work" (P2860), applied to over 312 million item pages as of November 2025, primarily for bibliographic citations in scholarly and creative works.[21][22] To promote data integrity, properties incorporate constraints that enforce validation rules on associated statements. Examples include format constraints to ensure values match expected patterns (e.g., ISO 8601 for dates), uniqueness constraints limiting a property to a single value per item (e.g., for identifiers like ISBN), and type constraints verifying that values align with specified classes or formats. These are defined on the property's page and checked automatically during editing, aiding in error detection and quality control.Statements
In Wikidata, statements form the fundamental units of structured data, representing assertions about entities through a subject-predicate-object triple structure. The subject is an item, such as a person, place, or concept identified by a unique Q-number (e.g., Q42 for Douglas Adams); the predicate is a property (e.g., P31 for "instance of"); and the object is the value, which can be another item (e.g., Q5 for "human"), a string, a quantity, a date, or other supported data types.[23] This triple-based model aligns with linked data principles, enabling interconnections across the knowledge base.[4] Properties in Wikidata often permit multiple statements to accommodate real-world complexity, such as varying attributes over time or contexts, with each statement-value pair assigned a rank to indicate its status: preferred for the most reliable or current information, normal as the default, or deprecated for outdated or incorrect data.[23][24] Ranks help editors and consumers prioritize information without deleting historical details. These statements can be enhanced with qualifiers for additional context, such as specifying a time period or location, though core assertions remain self-contained.[23] As of April 2025, Wikidata encompasses approximately 1.65 billion statements, supporting complex queries and integrations across Wikimedia projects and beyond.[25] A representative example is the statement for Douglas Adams (Q42): instance of (P31) human (Q5), which establishes the item's classification as a person.[23] This scale underscores Wikidata's role as a vast, collaborative repository of verifiable facts.Lexemes
Lexemes were introduced to Wikidata in May 2018 to extend its data model beyond encyclopedic concepts, enabling the structured storage of linguistic and lexical information for words, phrases, and their variations across languages.[26] Unlike general items (Q-IDs), lexemes are specialized entities identified by unique L-IDs, such as L7 for the English noun "cat," allowing for the representation of language-specific lexical units.[27] This addition supports the integration of dictionary-like data, complementing projects like Wiktionary through shared identifiers and tools for cross-referencing.[26] The core structure of a lexeme centers on its lemma, the canonical base form of the word (e.g., "cat" for L7), associated with a specific language item (such as Q1860 for English) and a lexical category denoting its grammatical role, like noun or verb.[28] Senses capture the distinct meanings of the lemma, each with a gloss and optional statements linking to related concepts; for instance, L7 includes senses for the domesticated animal and the musical instrument.[27] Forms represent inflected or derived variants, such as "cats" or "cat's," including textual representations and grammatical features like number (plural) or case, drawn from ontologies in the Linguistic Linked Open Data community.[28] These components allow lexemes to link to broader Wikidata items via statements, facilitating connections between lexical and encyclopedic knowledge.[29] As of 2025, Wikidata's lexeme collection includes over 1.3 million entries across hundreds of languages, reflecting rapid community contributions.[30] This growth underscores lexemes' role in supporting Wiktionary integration, where data can be imported or exported to enrich dictionary entries.[26] Lexemes enable detailed linguistic annotations, such as etymological links tracing word origins; for example, the Afrikaans lexeme for "hond" (dog, L208466) connects through derivations to Dutch, Middle Dutch, Old Dutch, Proto-Germanic, and ultimately Proto-Indo-European roots.[31] Pronunciation data, including International Phonetic Alphabet (IPA) transcriptions, is attached to forms and qualified by senses to specify contexts like regional accents.[28] These features promote applications in natural language processing and multilingual research by providing verifiable, interconnected lexical data.[31]Entity Schemas
Entity schemas in Wikidata are declarative models that define the expected structure and constraints for classes of entities, enabling validation to ensure data consistency and quality. Launched on May 28, 2019, they utilize the Shape Expressions (ShEx) language, expressed in ShExC syntax, and are stored as a dedicated entity type in the EntitySchema namespace, identifiable by the prefix "E". This infrastructure allows users to specify required properties, their cardinalities, allowed values, and relationships for specific item classes, such as mandating a birth date property for items of the class "person" (Q215627). Unlike property constraints, which focus on individual properties, entity schemas provide holistic shape definitions for entire entity sets, including qualifiers and references.[14][32] The primary purpose of entity schemas is to model and validate RDF-based data structures within Wikidata, facilitating the detection of inconsistencies or errors during editing. Community members propose and develop schemas through the WikiProject Schemas, with versioning supported via Wikidata's page history mechanism, allowing revisions and tracking of changes over time. Integration with editing tools enhances usability; for instance, ShExStatements enables schema generation from CSV files and validation against Wikidata items, while tools like Entityshape and WikiShape provide visual interfaces for creation and testing. These features promote collaborative maintenance, where schemas can be proposed via requests for comment (RfC) to standardize data structures for particular subjects.[32][33][34] In practice, entity schemas support domain-specific applications, particularly in biomedicine, where they ensure consistent representation of entities like genes, proteins, and virus strains. For example, schemas for molecular biology entities define mandatory properties such as sequence data or taxonomic classifications, aiding in the integration of biomedical ontologies and reducing variability in knowledge graph subsets. Research initiatives have proposed expanding these schemas to cover clinical entities, enhancing Wikidata's utility in health-related data modeling and validation.[35][36][37]Data Structure and Management
Qualifiers, References, and Constraints
In Wikidata, qualifiers, references, and constraints serve as essential mechanisms to add context, verifiability, and validation to statements, which are the core property-value pairs representing knowledge about entities.[23] Qualifiers provide additional details to refine a statement's meaning, references link statements to supporting sources for credibility, and constraints enforce rules to maintain data consistency and prevent errors. Together, these features enhance the reliability and usability of Wikidata's knowledge graph by allowing nuanced, sourced, and structured information without altering the primary statement structure. Qualifiers are property-value pairs attached to a statement to expand, annotate, or contextualize its main value, offering further description or refinement without creating separate statements.[38] For instance, a statement about the population of France (66,600,000) might include qualifiers such as "excluding Adélie Land" to specify territorial scope, or for Berlin's population (3,500,000), qualifiers like "point in time: 2005" and "method of estimation" clarify temporal and methodological aspects.[38] Similarly, a statement designating Louis XIV as King of France could be qualified with "start time: 14 May 1643" and "end time: 1 September 1715" to denote the duration of his reign.[38] These qualifiers modify the statement's interpretation—such as constraining its validity to a specific period or context—while avoiding ambiguity by not altering other qualifiers on the same statement. By enabling such precision, qualifiers help resolve multiple possible values for a property and support community consensus on disputed facts through ranking mechanisms.[38] References in Wikidata consist of property-value pairs that cite sources to back up a statement, ensuring its verifiability and traceability to reliable origins.[39] They typically employ properties like "stated in (P248)" to reference publications or items (e.g., books or journals) and "reference URL (P854)" for online sources, often supplemented with details such as author, publication date, or retrieval date.[39] For example, a statement about a scientific fact might reference the CRC Handbook of Chemistry and Physics via its Wikidata item, or an online claim could cite a specific webpage URL with the access date to account for potential changes.[39] References are required for most statements, except those involving common knowledge or self-evident data, and can be shared across multiple statements to promote efficiency. This sourcing practice upholds Wikidata's commitment to reliability, allowing users to verify claims against primary or authoritative materials like academic journals or official databases.[39] Constraints are predefined rules applied to properties via the "property constraint (P2302)" property, functioning as editorial guidelines to ensure appropriate usage and detect inconsistencies in data entry.[40] Implemented through the Wikibase Quality Constraints extension, these rules—over 30 types in total—categorize into datatype-independent (e.g., single-value, which limits a property like place of birth to one value per entity) and datatype-specific (e.g., format, which validates identifiers against patterns like ISBN or email syntax).[40] For instance, a single-value constraint prevents duplicate entries for unique attributes, while a format constraint ensures telephone numbers adhere to expected structures. Violations are reported to logged-in editors via tools like the constraint report, though exceptions can be explicitly noted using qualifiers like "exception to constraint (P2303)" for edge cases, such as a fictional entity defying real-world rules. By providing these checks, constraints proactively prevent errors, promote data quality, and guide contributors toward consistent modeling, ultimately bolstering the graph's integrity without imposing rigid enforcement.[40]Editing Processes and Tools
Human editing on Wikidata primarily occurs through a web-based interface accessible via the project's main site, where users can search for existing items using titles or identifiers and create new ones if none exist.[41] To create an item, editors enter a label (the primary name in a chosen language) and a brief description to disambiguate it, followed by adding aliases for alternative names and interwiki links to corresponding Wikipedia articles in various languages.[41] Once created, statements—structured triples consisting of an item, property, and value—can be added directly in the interface, with options to include qualifiers and references for precision.[42] For larger-scale human contributions, tools like QuickStatements enable batch uploads by allowing editors to input simple text commands or CSV files to add or modify labels, descriptions, aliases, statements, qualifiers, and sources across multiple items.[43] This tool, developed by Magnus Manske, processes commands sequentially via an import interface or external scripts, making it suitable for importing data from spreadsheets without needing programming knowledge, though users must supply existing item identifiers (QIDs) for accurate targeting.[43] Similarly, OpenRefine supports reconciliation of external datasets with Wikidata by matching values in tabular data (e.g., names or identifiers) to existing items through a dedicated service, flagging ambiguities for manual review and enabling bulk additions of new statements or links.[44] OpenRefine's process involves selecting a reconciliation endpoint (such as the Wikidata-specific API), restricting matches by entity types or languages, and using property paths to pull in details like labels or sitelinks for augmentation.[44] Machine editing on Wikidata is governed by strict guidelines to ensure quality and prevent disruption, with bots—automated or semi-automated scripts—requiring separate accounts flagged as "bot" and operator contact information.[45] Approval for bot flags is obtained through community requests on Meta-Wiki, where proposals detail the bot's purpose, such as importing identifiers from external databases (e.g., ISBNs or GeoNames IDs), and undergo review for compliance with edit frequency limits and error-handling mechanisms; global bots may receive automatic approval for specific tasks like interwiki maintenance.[45] Once approved, bots operate under reduced visibility in recent changes to avoid overwhelming human editors, but they must pause or be blocked if malfunctions occur, with flags revocable after discussion or prolonged inactivity.[45] Collaboration and maintenance rely on version history, which tracks all edits to an item with timestamps, user attributions, and diffs for comparison, allowing reversion to prior states via the "history" tab.[42] Talk pages associated with each item facilitate discussions on proposed changes, disputes, or improvements, mirroring Wikimedia's broader discussion norms.[42] Reversion tools integrated into the interface enable quick undoing of errors or vandalism, often used in tandem with watchlists to monitor items.[42] Wikidata's community upholds norms emphasizing notability for items—requiring they support Wikimedia projects, link to reliable sources, or fill structural roles—while promoting neutrality through unbiased descriptions and balanced statements.[42] All claims must be sourced to verifiable references, such as published works or databases, with unsourced statements discouraged and subject to removal; editors are encouraged to join WikiProjects for coordinated adherence to these standards.[42]Content Scope and Quality Control
Wikidata's content scope encompasses structured data across diverse domains, including over 10 million biographies of humans marked as instances of the "human" class (Q5), detailed geographic entities such as locations and administrative divisions, medical concepts like diseases and treatments, and scholarly metadata through initiatives like WikiCite for citations and references. This breadth supports interoperability with Wikimedia projects and external applications while adhering to strict verifiability standards, ensuring all entries draw from reliable, published sources rather than primary data collection. Notably, Wikidata explicitly excludes original research, personal opinions, or unpublished material, positioning it as a secondary knowledge base that aggregates and links to authoritative references such as academic publications, official databases, and news outlets.[46][47][47][48] To facilitate coordinated development within these thematic areas, Wikidata relies on community-driven WikiProjects that focus on specific domains, providing guidelines, property standards, and collaborative tasks. For instance, WikiProject Music standardizes properties like performer (P175), instrument (P1303), and release identifiers (e.g., Discogs master ID, P1954) to enhance coverage of compositions, artists, albums, and genres, while enabling cross-project data mapping from Wikipedia and Commons. These projects promote thematic consistency by organizing SPARQL queries for gap analysis, encouraging contributor participation through chat channels and task lists, and ensuring alignment with broader Wikidata schemas without imposing rigid notability criteria.[49][50] Quality control mechanisms emphasize proactive detection and community oversight to uphold data integrity. Database reports, such as those tracking constraint violations, systematically scan for non-compliance with predefined rules—like mandatory qualifiers or format constraints—listing affected items and statements for editors to review and resolve, thereby preventing structural degradation. Community-voted deletions further support maintenance, allowing proposals for removing redundant or erroneous properties and items through dedicated request pages, where consensus guides administrative action. These tools integrate with editing interfaces to flag issues in real-time, drawing on templates like the Constraint template for automated validation.[51][52][53] Despite these safeguards, challenges persist in maintaining accuracy, particularly vandalism detection and multilingual consistency. Vandalism, often involving disruptive edits like false statements or mass deletions, is mitigated through machine learning classifiers that analyze revision features—such as edit patterns and abuse filter tags—to identify 89% of cases while reducing patroller workload by 98%, as demonstrated in research prototypes adaptable to Wikidata's abuse filter. Multilingual consistency presents another hurdle, with studies revealing issues like duplicate entities, missing triples, and taxonomic inconsistencies across language versions, exacerbated by varying editorial priorities and source availability, though constraint checks and cross-lingual queries help address them.[54][55]Technical Infrastructure
Software Foundation
Wikidata's software foundation is built upon the MediaWiki platform, which provides the core wiki engine for collaborative editing and version control.[56] The Wikibase extension suite transforms MediaWiki into a structured data repository, enabling the creation, management, and querying of entities such as items and properties in a versioned, multilingual format.[57] This integration allows Wikidata to leverage MediaWiki's established infrastructure while adding specialized capabilities for knowledge graph operations.[58] Data storage in Wikidata relies on a dual approach to handle both relational and graph-based needs. Items and properties are primarily stored in a MySQL database, which supports the revision history, entity metadata, and structured attributes through Wikibase's schema.[59] For RDF representations, the system uses Blazegraph as a triplestore to manage billions of RDF triples derived from Wikidata entities, facilitating efficient SPARQL queries via the Wikidata Query Service.[6] This separation ensures robust handling of both editable content and semantic linkages. To address the scale of Wikidata's growing dataset, the infrastructure incorporates scalability features such as sharding and caching. Horizontal sharding partitions data across multiple Blazegraph nodes to distribute query loads and manage edit propagation, with ongoing efforts to optimize entity-based splitting.[60] Caching mechanisms, including in-memory stores and diff-based updates, reduce latency by minimizing redundant computations during data synchronization.[60] The entire system is hosted on servers managed by the Wikimedia Foundation in data centers across multiple locations, ensuring high availability and global access.[61] Wikibase and its components are released under the GNU General Public License version 2.0 or later (GPL-2.0-or-later), promoting open-source development and allowing independent installations of Wikibase repositories beyond Wikidata.[62] This licensing aligns with MediaWiki's copyleft model, fostering community contributions and reuse in diverse structured data projects.Query Services and Data Access
Wikidata provides several mechanisms for retrieving and manipulating its structured data, enabling users and applications to access the knowledge graph efficiently. The primary query service is the Wikidata Query Service (WDQS), consisting of SPARQL endpoints launched in September 2015 that support complex, federated queries across Wikidata's RDF triples and external linked data sources. In May 2025, to enhance scalability, the WDQS backend was updated to split the dataset into a main graph (accessible via query-main.wikidata.org or the redirected query.wikidata.org) and a scholarly graph (query-scholarly.wikidata.org), with a legacy full-graph endpoint (query-legacy-full.wikidata.org) available until December 2025. Queries spanning both graphs now require SPARQL federation. The Wikimedia Foundation is also searching for a replacement to Blazegraph, the current triplestore backend, due to its lack of updates since 2018.[63][64][65][6] This service allows for sophisticated pattern matching and filtering, such as retrieving all instances of cities with a population exceeding 1 million, by leveraging predicates likewdt:P1082 for populated places and wdt:P31 for instance-of relations.[66]
In addition to SPARQL, Wikidata offers programmatic access through APIs tailored for different operations. The MediaWiki Action API facilitates both read and write interactions with entities, supporting actions like fetching entity data via wbgetentities or editing statements through wbeditentity.[67] Complementing this, the Wikibase REST API provides a modern, stateless interface primarily for entity retrieval, such as obtaining JSON representations of items or properties without the overhead of session-based authentication.[68] These APIs adhere to standard HTTP practices, with endpoints like https://www.wikidata.org/w/api.php for the Action API and https://www.wikidata.org/rest.php for REST operations, ensuring compatibility with a wide range of client libraries and tools.[30]
To illustrate basic querying, SPARQL SELECT patterns form the foundation of WDQS interactions. A simple example retrieves all humans born in the 20th century:
SELECT ?human ?humanLabel ?birthDate
WHERE {
?human wdt:P31 wd:Q5 . # instance of [human](/page/Human)
?human wdt:P569 ?birthDate . # date of birth
FILTER(YEAR(?birthDate) >= 1900 && YEAR(?birthDate) < 2000) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10
SELECT ?human ?humanLabel ?birthDate
WHERE {
?human wdt:P31 wd:Q5 . # instance of [human](/page/Human)
?human wdt:P569 ?birthDate . # date of birth
FILTER(YEAR(?birthDate) >= 1900 && YEAR(?birthDate) < 2000) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10
SERVICE keyword, expanding results beyond Wikidata's core dataset.[6]
The query services incorporate safeguards to maintain performance and reliability. WDQS enforces timeouts, typically set to 60 seconds for public queries, to prevent resource exhaustion from computationally intensive operations, alongside result limits such as a maximum of 10,000 rows per response to balance load.[70] Ongoing improvements include query optimization techniques, like index utilization in the underlying Blazegraph engine, and integration with user-friendly interfaces such as the Wikidata Query Service's built-in editor, which offers syntax highlighting, prefix autocompletion, and visualization of results as tables or graphs.[6] These enhancements, combined with tools like Query Helper for visual query building, lower the barrier for non-experts while supporting advanced federated explorations.[71]
Integrations and Exports
Wikidata provides regular data exports to facilitate integration with external systems and applications. Weekly dumps of the entire database are generated in multiple formats, including JSON and RDF, which are recommended for their stable and canonical representations of entities, properties, and statements.[72] These dumps enable developers and organizations to download and process the full dataset offline, supporting use cases such as data mirroring, analysis, and custom knowledge graph construction. Additionally, full database downloads are available via torrent, providing an efficient method for obtaining large volumes of data, with recent dumps accessible through official and unofficial torrent links maintained by the Wikimedia community.[72][73] A key aspect of Wikidata's integrations involves interlinking its entities with other datasets to promote cross-referencing and alignment. For instance, Wikidata maintains connections to DBpedia, a structured knowledge base derived from Wikipedia, allowing bidirectional linking that enhances semantic web interoperability.[74] Similarly, links to Europeana, a digital cultural heritage aggregator, are established through property statements that map Wikidata items to Europeana records, facilitating enriched metadata for historical and artistic resources.[75] This interlinking is primarily achieved via external ID properties, which serve as dedicated identifiers (e.g., P31 for instance of or P279 for subclass of combined with source-specific IDs) to align entities across disparate datasets, ensuring consistent entity resolution without duplicating content.[76] The underlying Wikibase software, which powers Wikidata, is open-source and supports custom installations for organizations seeking tailored knowledge bases. Museums and cultural institutions have adopted Wikibase for this purpose, such as the Botanical Garden and Botanical Museum Berlin, which uses it to manage semantic descriptions of plant specimens and related metadata.[77] Another example is Rhizome, a New York-based arts organization, which has employed Wikibase since 2015 to archive and link born-digital art, demonstrating its flexibility for domain-specific data management.[78] All structured data on Wikidata, including statements, properties, and lexemes, is released under the Creative Commons CC0 1.0 Dedication, waiving all copyright and related rights to the fullest extent permitted by law.[79] This public domain status allows unrestricted reuse, modification, and distribution worldwide, making the data freely available for commercial, non-commercial, or derivative works without attribution requirements.[80]Usage and Applications
Role in Wikimedia Projects
Wikidata functions as the central storage for interlanguage links across all Wikimedia projects, particularly enabling seamless connections between articles in different languages on various Wikipedias. This migration began in February 2013, with the process aiming to centralize links in a single database to reduce maintenance efforts, and was completed across all Wikipedias by the end of that year.[81] Sitelinks on Wikidata items directly power these connections, allowing editors to add or update links from a unified interface.[81] In Wikipedia, Wikidata provides structured data that populates infoboxes through templates such as {{Infobox}}, which retrieve properties like population or coordinates directly from Wikidata items. As of July 2024, approximately 27% of English Wikipedia content pages include requests for Wikidata statements, often manifesting in these infoboxes to enhance article completeness without redundant editing across language versions.[82] This integration reduces duplication and ensures consistency, with examples including biographical details or geographic information drawn from verified Wikidata claims.[82] Wikivoyage leverages Wikidata to structure travel itineraries and listings, incorporating data like coordinates for mapping points of interest and details such as entrance fees or operating hours for attractions. Similarly, in Wikisource, Wikidata supports structured representation of texts through bibliographic metadata, including author details, publication years, and editions, retrieved via Lua modules for index pages and headers.[83] This allows automatic generation of consistent information across works, minimizing manual entry and enabling cross-project synchronization.[83] Wikidata further enhances Wikimedia projects with dynamic visualizations, such as interactive maps generated by Lua modules like Module:Mapframe, which pull coordinates and shapes from item properties to display locations in articles. Timelines can also be created using Lua scripting to sequence events or dates sourced from Wikidata, integrating seamlessly into Wikipedia and sister project pages for chronological representations.External and Research Applications
Wikidata has found significant application in the GLAM (galleries, libraries, archives, and museums) sector, where it serves as a collaborative platform for standardizing and linking metadata across cultural institutions. Libraries leverage Wikidata to create and publish linked open data, enabling the integration of bibliographic records with broader knowledge graphs for improved discoverability and interoperability.[84] For instance, tools like OpenRefine facilitate bulk reconciliation and editing of library catalogs against Wikidata, allowing institutions to map local metadata to global identifiers and enhance data sharing.[85] Museums and archives similarly use Wikidata to connect collection items, such as artworks or artifacts, to authoritative descriptions, supporting cross-institutional projects that build universal languages for cultural heritage data.[86] A 2024 systematic review of Wikidata's adoption in GLAM institutions highlights its role in fostering open data initiatives, emphasizing metadata harmonization and community-driven curation.[87] The Association of Research Libraries' 2019 white paper further recommends Wikidata for GLAM workflows, noting its potential to address silos in metadata management through structured, reusable properties.[88] In digital humanities, Wikidata supports interdisciplinary research by providing a flexible repository for historical and cultural data visualization. Tools like Histropedia enable the creation of interactive timelines derived from Wikidata queries, allowing scholars to map events, figures, and relationships across time with links to underlying sources.[89] This visualization capability transforms structured data into narrative-driven explorations, such as chronological overviews of historical periods or biographical trajectories. A 2023 systematic review of Wikidata in digital humanities projects, covering literature from 2019 to 2022, analyzed 50 initiatives and found that it is predominantly used as a content provider and integration platform, with applications in entity resolution and semantic enrichment for textual analysis.[90] The review identifies key challenges, including data sparsity in niche domains, but underscores Wikidata's value in enabling reproducible, collaborative humanities scholarship through SPARQL queries and property extensions.[91] Wikidata's utility in research extends to scholarly metadata and digital preservation, where it acts as a centralized hub for bibliographic and technical information. The WikiCite initiative structures citations and references within Wikidata, compiling open bibliographic metadata for scholarly articles, books, and datasets to support citation tracking and literature reviews across disciplines.[92] This includes properties for authors, publication dates, DOIs, and peer-review status, facilitating the creation of knowledge graphs for academic provenance. In digital preservation, Wikidata stores machine-readable metadata about software, file formats, and computing environments, aiding long-term access to digital artifacts.[93] For example, items on historical computing devices, such as early mainframes or programming languages, incorporate preservation-relevant details like emulation requirements and format specifications, enabling registries for software sustainability.[94] The 2019 iPRES conference paper on Wikidata for digital preservation emphasizes its infrastructure for syndicating such metadata, promoting community contributions to mitigate obsolescence in computing history.[95] Recent AI integrations have expanded Wikidata's reach by enhancing machine-readable access to its structured data. The Wikidata Embedding Project, launched on October 1, 2025, introduces a vector database that applies semantic embeddings to over 120 million entries, enabling efficient similarity searches and integration with large language models for knowledge retrieval.[96] This open-source initiative, developed by Wikimedia Deutschland, provides APIs for developers to query Wikidata semantically, supporting applications in natural language processing and recommendation systems while prioritizing data privacy and openness. Applications like Inventaire utilize Wikidata for resource mapping, particularly in bibliographic domains, by building a CC0-licensed database of books and entities to facilitate peer-to-peer sharing and inventory management.[97] Inventaire's model reconciles user-contributed data with Wikidata items via ISBNs and titles, creating federated networks for cultural resource discovery.[98]Impact and Reception
Growth and Adoption Statistics
Wikidata's item count has expanded significantly since its early years, reaching approximately 10 million items by October 2014 and surpassing 119 million items by late 2025.[99][100] The knowledge base now encompasses 1.65 billion statements as of April 2025, reflecting the accumulation of structured data across entities.[25] Edit activity remains robust, with around 13 million edits performed monthly in October 2025, contributing to a cumulative total exceeding 2.4 billion edits since launch.[101][100] Contributions are driven by a community of nearly 29,000 active registered editors monthly, alongside anonymous users and approximately 1,000 active bots, though bots account for about 52% of all edits.[100][102] Power users, who engage in high-volume and complex editing sessions, dominate alongside bots, while casual editors typically make fewer, simpler contributions.[103][104] Wikidata's global accessibility is enhanced by labels in over 300 languages, supporting multilingual data representation.[105] API traffic underscores its adoption, with the SPARQL query service processing approximately 10,000 requests per minute and total page views reaching 708 million in October 2025.[106][101]Awards, Criticisms, and Recognition
Wikidata has received notable recognition for its contributions to open data. In 2014, it was awarded the Open Data Publisher Award by the Open Data Institute for demonstrating high publishing standards and innovative use of challenging data on a massive scale.[8] More recently, in 2025, Wikidata was officially recognized as a digital public good by the Digital Public Goods Alliance, highlighting its role in promoting sustainable digital development and open access to structured knowledge worldwide.[107] Despite these accolades, Wikidata has faced criticisms regarding data quality, particularly the prevalence of unsourced claims that can propagate inaccuracies across linked projects. Studies have identified issues such as missing references and constraint violations, which undermine reliability in certain domains.[108] Additionally, coverage biases have been noted, with content disproportionately focused on Western-centric topics, leading to underrepresentation of non-Western cultural and demographic knowledge.[109] Reception of Wikidata has been largely positive, especially for advancing open access to structured data, as reflected in community celebrations like its 2017 fifth birthday, where contributors emphasized its growth into a vital hub for inter-language knowledge linking.[110] Research on editor dynamics further illustrates evolving participation, with analyses showing how power users and standard contributors adapt their editing behaviors over time, contributing to sustained development.[104] In response to quality concerns, the Wikidata community has initiated frameworks to address these challenges, including a 2024 referencing quality scoring system that evaluates sources based on Linked Data dimensions like completeness and understandability.[111] These efforts aim to enhance trustworthiness and mitigate biases through collaborative governance.References
- https://www.wikidata.org/wiki/Wikidata:Main_Page
- https://meta.wikimedia.org/wiki/Wikidata
- https://meta.wikimedia.org/wiki/Event:Improving_Nigeria_items_on_Wikidata_to_celebrate_Wikidata_12th_birthday
- https://www.wikidata.org/wiki/Wikidata:Data_model
- https://www.wikidata.org/wiki/Help:Data_type
- https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service
- https://meta.wikimedia.org/wiki/Wikidata/Technical_proposal
- https://www.wikidata.org/wiki/Event:WikidataCon_2025
- https://meta.wikimedia.org/wiki/Wikimedia_Deutschland/Plan_2025/en
- https://www.wikidata.org/wiki/Help:Items
- https://www.wikidata.org/wiki/Wikidata:Glossary
- https://www.wikidata.org/wiki/Help:Properties
- https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100
- https://www.wikidata.org/wiki/Help:Statements
- https://www.wikidata.org/wiki/Help:Ranking
- https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
- https://www.wikidata.org/wiki/Lexeme:L7
- https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation
- https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation/Lexeme_statements
- https://www.wikidata.org/wiki/Wikidata:Data_access
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas
- https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Wikidata_to_use_data_schemas_to_standardise_data_structure_on_a_subject
- https://www.wikidata.org/wiki/Wikidata:Database_reports/EntitySchema_directory
- https://meta.wikimedia.org/wiki/Research:Adapting_Wikidata_to_support_clinical_practice_using_Data_Science%2C_Semantic_Web_and_Machine_Learning
- https://www.wikidata.org/wiki/Help:Qualifiers
- https://www.wikidata.org/wiki/Help:Sources
- https://www.wikidata.org/wiki/Help:Property_constraints_portal
- https://meta.wikimedia.org/wiki/QuickStatements
- https://meta.wikimedia.org/wiki/Meta:Bots
- https://www.wikidata.org/wiki/Wikidata:Verifiability
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Biographical_Identifiers
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Music
- https://www.wikidata.org/wiki/Wikidata:WikiProjects
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations
- https://www.wikidata.org/wiki/Template:Constraint
- https://www.wikidata.org/wiki/Wikidata:Properties_for_deletion
- https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata
- https://www.mediawiki.org/wiki/Extension:Wikibase
- https://www.mediawiki.org/wiki/Wikibase
- https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy
- https://meta.wikimedia.org/wiki/Wikimedia_servers
- https://www.mediawiki.org/wiki/Extension:Wikibase_Client
- https://www.wikidata.org/wiki/Q20950365
- https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual
- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/May_2025_scaling_update
- https://www.mediawiki.org/wiki/API:Action_API
- https://www.wikidata.org/wiki/Wikidata:REST_API
- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_limits
- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Query_Helper
- https://www.wikidata.org/wiki/Wikidata:Database_download
- https://meta.wikimedia.org/wiki/Data_dump_torrents
- https://www.wikidata.org/wiki/Wikidata:External_identifiers
- https://www.wikidata.org/wiki/Wikidata:Licensing
- https://www.wikidata.org/wiki/Help:Copyrights
- https://meta.wikimedia.org/wiki/Wikidata/Help
- https://meta.wikimedia.org/wiki/WikiCite
- https://www.wikidata.org/wiki/Wikidata:Embedding_Project
- https://www.wikidata.org/wiki/Wikidata:Statistics
- https://www.wikidata.org/wiki/Help:Label

