You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
104 lines
5.9 KiB
104 lines
5.9 KiB
# Wikipedia Infobox Analyzer
|
|
|
|
On wikipedia there are different kinds of infoboxes.
|
|
Each modern infobox retrieves data from wikidata.
|
|
But due to legacy, many of the infoboxes still use manual values.
|
|
|
|
This analysis tool allows seeing the wikidata behind articles through the lens of the infoboxes.
|
|
It will detect if expected fields are missing in the wikidata, in which case the values are often manually set inside the article.
|
|
|
|
As articles in other languages are created, someone might also extend the wikidata entry.
|
|
That means that fields that were previously set manually, could now be updated to use wikidata.
|
|
So this tool can be used to analyze the used infobox template, to see which values are now present or are still missing in wikidata.
|
|
It offers an easier side-by-side comparison than going through all wikidata properties manually, as it looks only at the properties used by the infobox.
|
|
|
|
Infoboxes are interesting, and there could be plenty more to check.
|
|
But the aim of this tool is to be simple, to be used alongside the editor of wikipedia.
|
|
Analysis on an entire wikidata item is out of scope for this tool.
|
|
Read the warnings on wikipedia and wikidata for that kind of analysis.
|
|
|
|
## Usage
|
|
|
|
Here is a simple example on how this analyzer can be used for the wikipedia article about "Earth":
|
|
``` sh
|
|
wikipedia-infobox-analyzer
|
|
--title Earth
|
|
--lang en
|
|
--template <infobox_template_file>
|
|
```
|
|
|
|
By default, the tool assumes that you are looking for tools on the English wikipedia, but you can provide the language code of other wikipedia's like `fr`, `de`, `es` and `eo`.
|
|
Make sure the passed title matches the article title, and the tool should be able to find the wikidata entry.
|
|
The next section will go over what these infobox templates files are, where you can find them on wikipedia, and how you can customize them locally for your wikidata analysis.
|
|
|
|
### Interpretation of output
|
|
The left column of the table first lists all the properties that the infobox requires.
|
|
The right column will display the same property name if it is present in wikidata.
|
|
If it is not present, the right column will be blank.
|
|
Remaining properties in wikidata that are not required by the infobox are trailed in the right column.
|
|
|
|
``` markdown
|
|
┌──────────────────┬─────────────────────┐
|
|
│ Infobox requires │ Wikidata Earth (Q2) │
|
|
├──────────────────┼─────────────────────┤
|
|
│ P18 │ P18 │ // property: image
|
|
│ P170 │ P170 │ // property: creator
|
|
│ P571 │ P571 │ // property: inception
|
|
│ │ P31 │ // property: instance of
|
|
│ │ P138 │ // property: inception
|
|
│ │ P361 │ // property: part of
|
|
│ │ .... │
|
|
└──────────────────┴─────────────────────┘
|
|
```
|
|
|
|
This (shortened) example is complete, but if we were to try a different infobox template on the earth entry you can see that it is not a good fit.
|
|
To demonstrate, we can apply the software template to the earth entry:
|
|
|
|
``` markdown
|
|
┌──────────────────┬─────────────────────┐
|
|
│ Infobox requires │ Wikidata Earth (Q2) │
|
|
├──────────────────┼─────────────────────┤
|
|
│ P18 │ P18 │ // property: image
|
|
│ P154 │ │
|
|
│ P170 │ P170 │ // property: creator
|
|
│ P178 │ │
|
|
│ P275 │ │
|
|
│ P277 │ │
|
|
│ P306 │ │
|
|
│ P348 │ │
|
|
│ P400 │ │
|
|
│ P548 │ │
|
|
│ P571 │ P571 │ // property: inception
|
|
│ P577 │ │
|
|
│ P856 │ │
|
|
│ P1324 │ │
|
|
│ P2096 │ │
|
|
│ │ P10 │
|
|
│ │ P31 │
|
|
│ │ P138 │
|
|
│ │ .... │
|
|
└──────────────────┴─────────────────────┘
|
|
```
|
|
|
|
## Templates
|
|
The wikipedia sites vary a lot when it comes to templates across the different languages.
|
|
The goal of this tool is to be universal, but these templates have not been standardized as far as I am aware.
|
|
|
|
To mitigate this, templates can be customized and expected to be downloaded for your language from wikipedia.
|
|
For instance, go to the infobox template for planets on the English wikipedia (https://en.wikipedia.org/wiki/Template:Infobox_planet) and download the source to a file.
|
|
Then you can add the following line to that file locally:
|
|
|
|
``` text
|
|
{{... Wikidata|P18|P31|P361|P571}}
|
|
```
|
|
|
|
The program ignores what you put at the `...`.
|
|
It permits templates that include a listing of wikidata entries for their templates.
|
|
As an example, this is the case on the following software template on the Dutch wikipedia (the first word means "uses", it can be found here: https://nl.wikipedia.org/wiki/Sjabloon:Infobox_software):
|
|
|
|
``` text
|
|
{{Gebruikt Wikidata|P18|P154|P170|P178|P275|P277|P306|P348|P400|P548|P571|P577|P856|P1324|P2096}}
|
|
```
|
|
|
|
Ideally, these used properties would be discovered by use, but I have not found a way to do that universally.
|
|
Besides, you only have to do this once, and this way you can also customize the properties to look for.
|
|
|