Wikipedia:WikiMUC/Federated Queries Workshop/Documentation

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen

Day 1 Part A

Intro by Lucas QLever OSM: https://wiki.openstreetmap.org/wiki/QLever

Presentation 1: Katharina Brunner Remove NA:

Project from the Queer Archive in Munich

  • Q: for people without experience, what would be a good tutorial or learning materials for where to start?
    • - a good tutorial woul dbe the best. One that starts with the very intro/simple stuff, to make it accesible, and then build it up
  • Q: Did you find other people with similar queer data? Did you find a community?
    • I followed wikidata LGBT community datamodels someone [please add name] seems to have done most of the modelling
    • Community at the queer archives (outside wiki community)
      • Often professional digital archivers
  • Q: full dataset on Factgrid and then some of it on Wikidata. How do you decide what goes in WD?
    • One of the datasets was paosters/books from the queer archive in munich
      • We don't need all of this wikidata
      • certain publishing houses are still publishing but since 1890s
      • It's good to be able to label items about these publishing houses to say they are publishing queer books

Presentation 2: Maria Hinzmann and Julia Röttgermann Bidirectional Federated Queries on MiMoTextBase and Wikidata

-> Federation is at the core of Linked Open Data

  • Work about the French Enilghtenment
    • transferable to other disciplines/domains
    • Atomisation of lirary history
    • Break it down
  • Higher density of asertions for this narrow domain.
    • some data were meant for Wikidata too, but not everything.
    • they wanted to avoid redundancies
    • solution, to link
  • Connections between wikibases for federation represented with an External Mimotext ID on wikidata and

a URL "exact match" property on The Mimootext db

  • Examples of Uses
    • Querying from Mimotext
    • Linking geographic information from Wikidata with inforation about narrative themes / locations from Mimotext
    • Finding alternative Name of Authors. Taking all the various alternative names from Wikidata and not then maintaing all these alternative names on Mimotext
  • Using 3 endpoints:
    • Starting on Mimotext, query BNF (french national library) identifiers from Wikidata and then query for information from SPARQL endpoint of BNF

*Querying from Wikidata (l)

  • Narrative forms
    • Property: literary forms. Not available in WD, but in MM
    • Controlled vocabulary exists on wikidata? So we match the raw strings from Mimotext to the controlled vocal there? [ask for clarification]
  • Some more Example Queries: Wikidata -> MiMoTextBase

https://www.dariah.eu/2024/11/04/dhwiki-a-new-dariah-eu-working-group-focusing-on-building-bridges-between-different-sectors/

  • Possible Questions:
    • Controlled vocabulary exists on wikidata? So we match the raw strings from Mimotext to the controlled vocal there? [ask for clarification] this was to try and fill in the notes from above :P
    • First matched Raw strings w controlled vocabulary then items in Mimotext matched with the items in Wikidata
  • Questions:
    • How did you decide what to contribute back to Wikidata or not?
      • There is a slide (8.4). Only novels that had semantic narrative locations from more than one source with at least 3 statements. This was about 800 of 1800 novels.
    • Were you running your own OpenRefine Reconcilliation service and if how did you find doing that?
      • Initailly we used a bot for the bigger dataset. We also used quickstatements - it was a good experience
    • OR from MimoText?
    • no , the OR reconciliation service as it is

Presentation 3: Lozana Rossanova NFDI4Culture use-cases for Data Enrichment and Federation with Wikibase4Research


Wikibase - with additional extensions and tools Open Science Lab (within TIB Hannover)

Wikibase that contains real and digital representations of real items Federation is useful so we don't need to hold the information on the real objects but we do want to hold information on the 3d digital representations

Usecases Find 3 models of things within a certain geographic area using the locations of the real objects from Wikidata

Federation with Wikidata: https://query.semantic-kompakkt.de/#%23defaultView%3AMap%0A%0A%23%20Find%20all%20castles%20in%20the%20Wikibase%3B%20look%20on%20Wikidata%20for%20other%20renaissance%20castles%20nearby%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wdqs%3A%20%3Chttps%3A%2F%2Fquery.wikidata.org%2Fsparql%3E%0A%0ASELECT%20%20%3Fcastle%20%3FcastleLabel%20%3Flocation%20%3Fimage%0AWHERE%20%7B%0A%20%20%23Find%20all%20castles%20and%20their%20coordinates%0A%20%20%3Fitem%20tibt%3AP97%20tib%3AQ201.%0A%20%20%3Fitem%20tibt%3AP37%20%3Fcoordinates.%0A%20%20%0A%20%20%23Query%20wikidata%0A%20%20SERVICE%20wdqs%3A%20%7B%0A%20%20%20%20%0A%20%20%20%20%23Find%20castles%20with%20renaissance%20architectural%20style%0A%20%20%20%20%3Fcastle%20wdt%3AP31%20wd%3AQ751876.%0A%20%20%20%20%3Fcastle%20wdt%3AP149%20wd%3AQ236122.%0A%20%20%20%20%0A%20%20%20%20%23Look%20for%20those%20castles%20in%20a%20radius%20of%20100km%20around%20our%20castle%0A%20%20%20%20SERVICE%20wikibase%3Aaround%20%7B%20%0A%20%20%20%20%20%20%3Fcastle%20wdt%3AP625%20%3Flocation%20.%20%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Acenter%20%3Fcoordinates%20.%20%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aradius%20%22100%22%20.%20%0A%20%20%20%20%7D%0A%20%20%20%20%0A%20%20%20%20%23Get%20labels%20from%20Wikidata%0A%20%20%20%20%3Fcastle%20rdfs%3Alabel%20%3FcastleLabel.%0A%20%20%20%20OPTIONAL%20%7B%20%3Fcastle%20wdt%3AP18%20%3Fimage.%20%7D%0A%20%20%20%20FILTER%28%28LANG%28%3FcastleLabel%29%29%20%3D%20%22de%22%29%0A%20%20%7D%0A%7D

3-way federated queries: WD Factgrid and instance Use Wikidata to get FactGrid Ids

    • Federation not only with WD (or other WBinstances)

QLever of DBLP: https://sparql.dblp.org/

Questions:

    • Did you start with pain points and Wikibase was the solution for this?
      • The idea of having Wikis is older,there were citizen science projects w\using WD
      • When started, needed to use Kompakt. Kompakt doesn't have a database

Presenation 4: Hannah Bast Federated Queries with QLever

Qlever https://qlever.cs.uni-freiburg.de/wikidata

First query: all films on Wikidata along with their IMDb rating https://qlever.cs.uni-freiburg.de/wikidata/hIX2ee

quering simultaneously imdb, to have imdb ratings together with the information from WD https://qlever.cs.uni-freiburg.de/wikidata

Second query: The same, but only for films by Quentin Tarantino: https://qlever.cs.uni-freiburg.de/wikidata/cwNuCF Third query: The power network of the EU: https://qlever.cs.uni-freiburg.de/osm-planet/aXwfk5 Fourth query: All buildings in Germany near a train or subway station: https://qlever.cs.uni-freiburg.de/osm-planet/jKxS2A

Questions:

    • How are the differnt OSM-Taggings represented as triplets
    • Are the differences in the SPARQL dialect? E.g with the label service?

Presentation 5: Max Kristen DeJongeWiki and WikiFAIR

Problem: Running Wikibase in restitive university environment Painpoint: Java

Blazegraph was a problem very early on Doesn't support QueryService at the moment

Wiki: https://set.kuleuven.be/rlicc/dejongewiki/w/index.php/Main_Page Example Room: https://set.kuleuven.be/rlicc/dejongewiki/w/index.php/AC_MC_GW_1F_01.01 Wikibase of Example Room: https://set.kuleuven.be/rlicc/dejongewiki/w/index.php/Item:Q46

Shown Extensions:

   https://www.mediawiki.org/wiki/Extension:LinkedWiki
   https://www.mediawiki.org/wiki/Extension:UnlinkedWikibase

WikiFAIR: https://meta.wikimedia.org/wiki/WikiFAIR

Day 1 Part B

Example queries (even from people that can't attend!):

      1. Get Wikidata ID if some other matching ID exists
   PREFIX LUDAP: <redacted/entity/>
   PREFIX LUDAPt: <redacted/prop/direct/>
   PREFIX wdt: <http://www.wikidata.org/prop/direct/>
   PREFIX wd:  <http://www.wikidata.org/entity/>
   PREFIX p: <http://www.wikidata.org/prop/>
   PREFIX pr: <http://www.wikidata.org/prop/reference/>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   SELECT ?person ?personName ?WDitem WHERE {
   ?person LUDAPt:P1 LUDAP:Q17 . # instance of a person
   ?person LUDAPt:P9 ?personName.
   ?person LUDAPt:P41 ?GETTY. # Getty identifier
   FILTER NOT EXISTS { ?person LUDAPt:P70 ?WikidataID } . # No Wikidata ID


   SERVICE <https://query.wikidata.org/sparql> {
       ?WDitem wdt:P245 ?GETTY .
       }
   } ORDER BY ?personName


   ### Get other ID if Wikidata qid exists
   SELECT ?person ?personName ?WDitem ?WDName ?ISNI_link WHERE {
   ?person LUDAPt:P1 LUDAP:Q17 . # instance of a person
   ?person LUDAPt:P9 ?personName.
   ?person LUDAPt:P70 ?WikidataID. #WikidataID existsn
   FILTER NOT EXISTS { ?person LUDAPt:P37 ?ISNI } . # ISNI does not exists
   BIND(IRI(CONCAT("http://www.wikidata.org/entity/", ?WikidataID )) AS ?WDitem ). #Construct IRI for Wikidata    
   SERVICE <https://query.wikidata.org/sparql> {
       ?WDitem wdt:P213 ?WDISNI ;
         rdfs:label ?WDName .
         FILTER (langMatches( lang(?WDName), "FR" ) ) # Get French label
       }
   BIND (URI(CONCAT("https://isni.org/isni/",?WDISNI)) AS ?ISNI_link) #Present full ISNI ID for testing purposes.
   } ORDER BY ?personName


   ### Data validation on Wikidata
   SELECT ?person ?personName ?VIAF ?WDitem ?WDVIAF ?VIAF_link WHERE {
   ?person LUDAPt:P1 LUDAP:Q17 . # instance of a person
   ?person LUDAPt:P9 ?personName.
   ?person LUDAPt:P70 ?WikidataID. # get local Wikidata QID
   ?person LUDAPt:P38 ?VIAF. # get local VIAF
   BIND(IRI(CONCAT("http://www.wikidata.org/entity/", ?WikidataID )) AS ?WDitem ) .  #Construct IRI for Wikidata 
   SERVICE <https://query.wikidata.org/sparql> {
       ?WDitem wdt:P214 ?WDVIAF . #Get VIAF from Wikidata
       }
   BIND (URI(CONCAT("https://viaf.org/viaf/",?WDVIAF)) AS ?VIAF_link)
     FILTER (?WDVIAF != ?VIAF) #only show when local and Wikidata VIAF are NOT the same
   } ORDER BY ?personName


   ### Double federated queries to retrieve GND info based on Wikidata ID
   PREFIX gndo:<https://d-nb.info/standards/elementset/gnd#>


   SELECT DISTINCT ?person ?personName ?WDGND ?GNDName ?dateBirth ?GNDBirth ?dateDeath ?GNDDeath WHERE {
     ?person LUDAPt:P1 LUDAP:Q17; #instance of person
       LUDAPt:P9 ?personName;
       LUDAPt:P70 ?WikidataID;
       LUDAPt:P19 ?dateBirthRAW.
     FILTER((DATATYPE(?dateBirthRAW)) = xsd:dateTime) # filter out EDTF dates
     BIND(SUBSTR(STR(?dateBirthRAW), 0 , 11 ) AS ?dateBirth) # Convert datetime for ease of matching.
     ?person LUDAPt:P21 ?dateDeathRAW.
     FILTER((DATATYPE(?dateDeathRAW)) = xsd:dateTime) # filter out EDTF dates
     BIND(SUBSTR(STR(?dateDeathRAW), 0 , 11 ) AS ?dateDeath) # Convert datetime for ease of matching.
     FILTER(NOT EXISTS { ?person LUDAPt:P39 ?GND. }) # No GND should exists
     BIND(IRI(CONCAT("http://www.wikidata.org/entity/", ?WikidataID)) AS ?WDitem) # Create Wikidata IRI
     SERVICE <https://query.wikidata.org/sparql> { 
         ?WDitem wdt:P227 ?WDGND. # Retrieve GND from WIkidata
   } 
     SERVICE <http://zbw.eu/beta/sparql/gnd/query> {
       ?gndPerson gndo:gndIdentifier ?WDGND; # Use Wikidata GND identifier to retrieve further information from GND for mathching purposes.
         gndo:preferredNameForThePerson ?GNDName; 
         gndo:dateOfBirth ?GNDBirth;
         gndo:dateOfDeath ?GNDDeath.
     }
   }
   ORDER BY (?personName)

Person from Factgrid, that has any connection to a person or organisation in Wikidata and DBPedia :

Documentation of the second part of Day 1:

  • formatter URI for RDF resource (https://www.wikidata.org/wiki/Property:P1921) can be used to make Wikibase generate the correct URI for an entity, instead of having to use BIND(IRI(CONCAT("somePrefix", ?id)) AS ?x) in your query; but it’s tricky to use correctly:
    • create the property
    • add the formatter URI for RDF resource
    • wait 24 hours for the property info cache to expire
    • only then start adding actual statements for the property
    • difference between http and https!

Great working example on QLever showing genres on MimoText: https://qlever.cs.uni-freiburg.de/wikidata/NlrCC3

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federated_queries: Second example uses Wikidata to determine the nationality of the artists

Determine Education by artist:

Explain prefixes better in the federation examples, and also when they are not needed, see example above, pin button in query service

Explain the intrinsic details of the BIND keyword Label Service only on Wikibase Endpoints! For other endpoints: rdfs:label

Allowlisting!!

Dragan's question(s) What's better to use for external links, a "same as" property pointing to the remote entity or a "external id" wikibase item? is there a way to associate a SPARQL endpoint URL to a property or value that can be used in federated querying?

Day 2 (06.12.2024): https://de.wikipedia.org/wiki/Wikipedia:WikiMUC/Federated_Queries_Documentation_Brunch


Day 2 (06.12.2024): https://de.wikipedia.org/wiki/Wikipedia:WikiMUC/Federated_Queries_Documentation_Brunch


Documentation to (possibly) improve:


Deprecation of following pages:

Various community tutoprials from Wikibase projects (Factgrid, MiMoText, etc) - consolidate & interlink

    • Instructions for Wikibase configuration - where?

The two ;-) lists:


Update MultiChillBot?

For 3rd party wikibases using the wikibase suite images the allowlist is hard coded at:

Open Phabricator tickets to remove broken endpoints:

-> Try to categorise those

Possible categories:

    • Wikidata (whitelisting?)
      • What about Cloud?
      • Federation across different Graph Engines (!)

What do we think is missing:

    • Whitelisting

Wikibase Stakeholder Meeting that day: https://notepad.rhizome.org/wbsg-2024-12-05?edit

3 Working Groups (add your names and notes below each group):

General documentation

Updated Pages:

Introduction:

The vision of Linked Open Data is to overcome disconnected data silos. Federated queries, in other words the simultaneous querying of multiple SPARQL endpoints, play a key role in realizing this potential. Ideally, different datasets that have been created in very different contexts can refer to each other, so that 'federation' brings together knowledge stored in different locations in the global Linked Open Data Cloud. In order to actually be able to connect several knowledge graphs via 'federated queries', a number of requirements must be met. This page focuses on federation, which has Wikidata as a starting point and addresses other SPARQL endpoints via federation.

(To learn more about what steps are necessary when hosting a Wikibase instance yourself, see: #)

Connected items In order to be able to write federated queries with Wikidata as starting point, you need to know the Wikidata data model (https://commons.wikimedia.org/wiki/File:SPARQL_data_representation.png) and the data model(s) of the SPARQL endpoint(s) you want to query. To enable federation, the items between the different knowledge graphs must refer to each other. There are standardized properties such as owl:sameAs and wdt:exact match for this purpose. In addition, matching in Wikidata often works via “external identifiers”. Precise knowledge of the data model with regard to the respective property is therefore important.


Allowlist The SPARQL endpoints must also each allow federation to another SPARQL endpoint. The current [Wikidata allowlist #] provides an overview of all SPARQL endpoints that you can query via Wikidata.


Prefixes When creating queries manually, or using a query visual helper, it is useful to understand the concept of prefixes. Prefixes are shortcuts replaceing the need to write a full URI path to a resource. Prefixes can be defined at the start of every SPARQL query to ease readability and avoid potential for spelling mistakes. The Wikidata Prefixes are build-in in the Wikidata Query Service and therefore queries also work without Wikidata prefixes being defined again in the query. However, it is good practice to make all relevant prefixes (including those from Wikidata) transparent, especially when writing federated queries.


Calling up an external endpoints The basic mechanism of federation is to use the SERVICE operator to call up another endpoint. As part of the Prefix defaults, wdqs already defines the the Wikidata endpoint <https://query.wikidata.org/sparql>, so a simple federation syntax to Wikidata would look like this:

SERVICE wdqs {

?a ?b ?c

}

or if not using prefixes the call function would look like this:

SERVICE <https://query.wikidata.org/sparql> {

?a ?b ?c

}

Using external identifiers / IRIs / and the BIND operator When federating, typically we want to get more information about items that exist, or are referenced, in more than one LOD resource. This means we need to have a connecting mechanism between the two reources. One common way of doing that is using External ID properties in Wikidata. However, the default result when asking for the value of an External ID property in Wikidata is a literal and not an IRI that can be used in federated queries. This means we will only get a QID, rather than a full IRI. To be able to re-use the same QUID as a starting point in the federated part of the query, we need to BIND the QID literal to a path (either with a prefix or not). To do this, we need to implement the following syntax:

BIND(IRI(CONCAT(STR(wd:), ?Wikidata_id)) AS ?Wikidata_item)

There is a workaround to avoid using BIND, but there are some caveats. Instead of using the typical path to request a value for a property (i.e. via wdt:, or <http://www.wikidata.org/prop/direct/>), you could also use wdtn: (or <http://www.wikidata.org/prop/direct-normalized/>). This will call up the value generated via the formatter URI for RDF resource (https://www.wikidata.org/wiki/Property:P1921), if it has been defined for the respective external ID property. This will then generate the correct URI for an entity, instead of having to use BIND(IRI(CONCAT("somePrefix", ?id)) AS ?x) in your query. But there are some potential issues that have to be kept in mind:

    • once the formatter URI for RDF resource has been added, you have to wait 24 hours for the property info cache to expire
    • only then start adding actual statements for the property, previously added statements will not be formatted properly (so for those BIND operator will still be needed)
    • note the difference between http and https - the source of truth is always the Concept URI (available in the sidebar on most Wikibase installations including Wikidata); the correct address (http/https) method has to be used consistently when defining prefixes, BINDings, or when adding values to the formatter URI for RDF property.


Examples

(Katharina, Lucas,...)


Allowlist

(Tom, Lukas)


TO DOs:


NEEDS from the community scoped for Wikidata / Wikibase Suite team:

  • Make defining prefixes (in the Pin button - i.e. in the UI) easier in the Docker distribution for WB Suite + get rid of the "magic" implicit Prefixes.
  • Default prefixes (both "magic" ones and the prefixes defined by the Pin button) should be custom for WB Cloud, not same as wd / wdt etc (Configuration open per Wikibase)
  • Document setting up the sparql endpoint in the Docker distribution (not the WDQS address but the actual query.[your WB address and domain]/sparql
  • Easier configuration of allowlist.txt - TIB to share their fundings after the workshop - this needs to be enabled not just for Suite but also for Cloud.
  • Better documentation for Wikibase Cloud specifics, e.g if and how they federate with each other
  • UI Update for the "Pin" function of the Query Service UI as a more prominent button not in the sidebar, but above the query editing space - reference the QLever UI
    • Selectable Dropdown of Allowlist - this can be tied to the Pin button, so that once selecting a service from the dropdown, the relevant prefixes are added at the top of the query editor
    • Wikidata Query Service UI is not neutral for other Wikibases, e.g logo and auto-complete function should work depending on selected Prefixes

2024-12-05 13:38

Example of Wikibase query service qirk when the shorten URL creation fails. We were unsure if this is due to WB's installation to the external shorten url service, tinyurl or something else at work. Ideally, WB provides a shorten URL similar to Wikidata's w.wiki that can be customized by respective Wikibase instance. In our case, we'd love to be able to have the namespace be for instance w.siwiki.

Smithsonian Wikibase: SI-WikiNames federated query example below (comments in CAPs explaining the issue inserted between ### lines)

  1. Citizenship of artists from WD
  1. defaultView:ImageGrid


PREFIX

wddt: <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?person ?remote_item ?person_label ?image ?citizenship_label WHERE {

?person p:P208/ps:P208/wdt:P209* wd:Q18 ; # instance of human/person (Q18) wdt:P17 ?remote_item ; # exact match with external URL WD QID rdfs:label ?person_label .

IN ORDER TO GENERATE SHORTEN URL, A FORCED SPACE AFTER THE "UNCAT" FUNCTION

BIND(uri(concat ("http://www.wikidata.org/entity/", ?remote_item)) AS ?WID)


SERVICE <https://query.wikidata.org/sparql> {

?WID wddt:P27 ?citizenship .

?citizenship rdfs:label ?citizenship_label .

OPTIONAL { ?WID wddt:P18 ?image .} FILTER (lang(?citizenship_label) = "en")

}

}

ORDER BY (?person_label)