|
18 hours ago | |
---|---|---|
examples/templates | 2 days ago | |
src | 19 hours ago | |
.gitignore | 7 days ago | |
Cargo.lock | 1 day ago | |
Cargo.toml | 18 hours ago | |
LICENSE-APACHE | 7 days ago | |
LICENSE-MIT | 7 days ago | |
README.md | 18 hours ago |
README.md
Wikipedia Infobox Analyzer
On wikipedia there are different kinds of infoboxes. Each modern infobox retrieves data from wikidata. But due to legacy, many of the infoboxes still use manual values.
This analysis tool allows seeing the wikidata behind articles through the lens of the infoboxes. It will detect if expected fields are missing in the wikidata, in which case the values are often manually set inside the article.
As articles in other languages are created, someone might also extend the wikidata entry. That means that fields that were previously set manually, could now be updated to use wikidata. So this tool can be used to analyze the used infobox template, to see which values are now present or are still missing in wikidata. It offers an easier side-by-side comparison than going through all wikidata properties manually, as it looks only at the properties used by the infobox.
Infoboxes are interesting, and there could be plenty more to check. But the aim of this tool is to be simple, to be used alongside the editor of wikipedia. Analysis on an entire wikidata item is out of scope for this tool. Read the warnings on wikipedia and wikidata for that kind of analysis.
Usage
Here is a simple example on how this analyzer can be used for the wikipedia article about "Earth":
wikipedia-infobox-analyzer
--title Earth
--lang en
--template <infobox_template_file>
By default, the tool assumes that you are looking for tools on the English wikipedia, but you can provide the language code of other wikipedia's like fr
, de
, es
and eo
.
Make sure the passed title matches the article title, and the tool should be able to find the wikidata entry.
The next section will go over what these infobox templates files are, where you can find them on wikipedia, and how you can customize them locally for your wikidata analysis.
Interpretation of output
The left column of the table first lists all the properties that the infobox requires. The right column will display the same property name if it is present in wikidata. If it is not present, the right column will be blank. Remaining properties in wikidata that are not required by the infobox are trailed in the right column.
┌──────────────────┬─────────────────────┐
│ Infobox requires │ Wikidata Earth (Q2) │
├──────────────────┼─────────────────────┤
│ P18 │ P18 │ // property: image
│ P170 │ P170 │ // property: creator
│ P571 │ P571 │ // property: inception
│ │ P31 │ // property: instance of
│ │ P138 │ // property: inception
│ │ P361 │ // property: part of
│ │ .... │
└──────────────────┴─────────────────────┘
This (shortened) example is complete, but if we were to try a different infobox template on the earth entry you can see that it is not a good fit. To demonstrate, we can apply the software template to the earth entry:
┌──────────────────┬─────────────────────┐
│ Infobox requires │ Wikidata Earth (Q2) │
├──────────────────┼─────────────────────┤
│ P18 │ P18 │ // property: image
│ P154 │ │
│ P170 │ P170 │ // property: creator
│ P178 │ │
│ P275 │ │
│ P277 │ │
│ P306 │ │
│ P348 │ │
│ P400 │ │
│ P548 │ │
│ P571 │ P571 │ // property: inception
│ P577 │ │
│ P856 │ │
│ P1324 │ │
│ P2096 │ │
│ │ P10 │
│ │ P31 │
│ │ P138 │
│ │ .... │
└──────────────────┴─────────────────────┘
Templates
The wikipedia sites vary a lot when it comes to templates across the different languages. The goal of this tool is to be universal, but these templates have not been standardized as far as I am aware.
To mitigate this, templates can be customized and expected to be downloaded for your language from wikipedia. For instance, go to the infobox template for planets on the English wikipedia (https://en.wikipedia.org/wiki/Template:Infobox_planet) and download the source to a file. Then you can add the following line to that file locally:
{{... Wikidata|P18|P31|P361|P571}}
The program ignores what you put at the ...
.
It permits templates that include a listing of wikidata entries for their templates.
As an example, this is the case on the following software template on the Dutch wikipedia (the first word means "uses", it can be found here: https://nl.wikipedia.org/wiki/Sjabloon:Infobox_software):
{{Gebruikt Wikidata|P18|P154|P170|P178|P275|P277|P306|P348|P400|P548|P571|P577|P856|P1324|P2096}}
Ideally, these used properties would be discovered by use, but I have not found a way to do that universally. Besides, you only have to do this once, and this way you can also customize the properties to look for.