Associating ISO19115-1 items with a provenance query service

Contributor: 
Evert Bleys

Introduction

Organisations wishing to associate items in an ISO19115-1-conformant catalogue with graph-based provenance delivered by a provenance query service (likely a SPARQL service) may wish to make the link to that service as visible as possible to people used to the ISO19115-1 norms. Since ISO19115-1 typically uses the Lineage field for structured or un-structured lineage information (provenance), this pattern suggests making that link to the provenance query service in that field.

Prerequisite

PROV DM in OWL ontology form (PROV-O) and SPARQL services require the identification of provenance objects using HTTP URIs. For this reason, any object catalogued according to ISO19115-1 wanting to link to a provenance query service as described here must be able to be referred to by URI. Some ISO19115-1-compliant catalogues use URIs but some only use UUIDs and do not provide persistent URIs base on those UUIDs. An effort must be made to establish persistent URIs using those UUIDs before integration with a provenance query service.

Implementation

This pattern is a specialisation of Pattern 12.1. It suggests placing a has_query_service link, as PROV-AQ calls it, in the LI_Lineage field in a manner conformant with ISO19115-1.

Pattern 25 diagram

In PROV terms

Using this method, we can work out that the query service itself is a PROV Entity (when described) and an Agent (when performing actions like causing Entities to be generated) but we can't say anything, in PROV terms, about the item linking to the query service. It could be any class of PROV object since any object may have provenance about it made available by a query service. Having said that, it is likely to be an Entity as ISO19115-1 catalogues tend to store information about sorts of Entities such as Datasets or Images.

ISO19115-1 Structure

The suggested structure is for the resourceLineage field of a metadata document to contain a LI_Lineage element containing a LI_Source which, in turn, cites an CI_OnlineResource which is described by a name, function, linkage, protocol & protocolRequest. Note that ISO191151 documents may contain multiple LI_Lineage objects within a resourceLineage field so that a reference to a provenance query service may be made alongside written statements of provenance, which are the current usage norm, or other structured lineage.

The specific elements that should be used to communicate that the metadata document is linking to a query service are described here:

  • CI_Citation.title - compulsory for any CI_Citation use. Should indicate a well-known name for the thing cited - the query service. Could be a phrase similar to "Organisation Z's Provenance Query Service".
  • CI_OnlineResource.name - optional. Likely a something similar to the citation title above.
  • CI_OnlineResource.function - required. The code term provenanceQueryService should be used.
    This term is available in the Geoscience Australia codelist extension to ISO19115-1's CI_OnlineFunctionTypeCode code list.
  • CI_OnlineResource.linkage - the URI to the endpoint of the query service
  • CI_OnlineResource.protocol - required. While ISO19115-1 treats this field as free text, the widely-used IDO19115-1 catalogue tool GeoNetwork implements a code list for it in its ISO19115-1 protocols vocabulary. Geosciecne Australia includes a term to indicate SPARQL services "HTTP-SPARQL" and this should be used in the case of a SPARQL service.
  • CI_OnlineResource.protocolRequest - optional. An example request. The most basic SPARQL query that could be used to find information about a resource is a DESCRIBE query which can be lodged over HTTP via a GET request. This field should give an example of that, or another query service's most basic query. For a SPARQL service, it would also need to indicate how to identify the resource being queried for which is the resource the ISO metadata is describing. In the SPARQL case, the query would be DESCRIBE <resource>; and if the resource was identifiable via a URI, perhaps http://example.com/dataset/x, then it would be DESCRIBE <http://example.com/dataset/x>;. The full information for this filed then would be that query as a provenance service request action and would just be an HTTP GET link. See the example below.

<mdb:resourceLineage>
  <mrl:LI_Lineage>
    <mrl:LI_Source>
      <mrl:sourceCitation>
        <cit:CI_Citation>
          <cit:title>
            <gco:CharacterString>{SERVICE_NAME}</gco:CharacterString>
          </cit:title>
          <cit:onlineResource>
            <mcc:CI_OnlineResource>
            <cit:name>
              <gco:CharacterString>{SERVICE_NAME}</CharacterString>
            </cit:name>
            <cit:function>
              <cit:CI_OnLineFunctionCode codeList="codeListLocation#CI_OnLineFunctionCode" codeListValue="provenanceQueryService"/>
            </cit:function>
            <cit:linkage>
              <gco:CharacterString>{ENDPOINT_URI}</gco:CharacterString>
            </cit:linkage>
            <cit:protocol>
              <gco:CharacterString>{PROTOCOL}</gco:CharacterString>
            </cit:protocol>
            <cit:protocolRequest>
              <gco:CharacterString>{REQUEST_EXAMPLE}</gco:CharacterString>
            </cit:protocolRequest>
            </mcc:CI_OnlineResource>
          </cit:onlineResource>
        </cit:CI_Citation>
      <mrl:sourceCitation>
    </mrl:LI_Source>
  <mrl:LI_Lineage>
</mdb:resourceLineage>

Example

For a Dataset X with a SPARQL query service at http://location.com/sparql, the following XML might be used:


<mdb:resourceLineage>
  <mrl:LI_Lineage>
    <mrl:LI_Source>
      <mrl:sourceCitation>
        <cit:CI_Citation>
          <cit:title>
            <gco:CharacterString>Provenance Query Service</gco:CharacterString>
          </cit:title>
          <cit:onlineResource>
            <mcc:CI_OnlineResource>
            <cit:name>
              <gco:CharacterString>Corporation A's Provenance Finder</CharacterString>
            </cit:name>
            <cit:function>
              <cit:CI_OnLineFunctionCode codeList="codeListLocation#CI_OnLineFunctionCode" codeListValue="provenanceQueryService"/>
            </cit:function>
            <cit:linkage>
              <gco:CharacterString>http://location.com/sparql</gco:CharacterString>
            </cit:linkage>
            <cit:protocol>
              <gco:CharacterString>HTTP-SPARQL</gco:CharacterString>
            </cit:protocol>
            <cit:protocolRequest>
              <gco:CharacterString>
                http://location.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fexample.com%2Fdataset%2Fx%3E
              </gco:CharacterString>
            </cit:protocolRequest>
            </mcc:CI_OnlineResource>
          </cit:onlineResource>
        </cit:CI_Citation>
      <mrl:sourceCitation>
    </mrl:LI_Source>
  <mrl:LI_Lineage>
</mdb:resourceLineage>

The CI_OnlineResource.protocolRequest value used in this example is the basic SPARQL DESCRIBE query for the resource applied to the SPARQL service in a GET request which is then a single link consisting of;

  • the base URI of the SPARQL service (as per the CI_OnlineResource.linkage field): http://location.com/sparql
  • The SPARQL service standardised query parameter: ?query=
  • The URL-encoded version of the query: DESCRIBE%20%3Chttp%3A%2F%2Fexample.com%2Fdataset%2Fx%3E for the query DESCRIBE <http://example.com/dataset/x>;
SVG Image: