Determine trust from the source of a dataset in a data conflation service

Many organizations acquire data about the same real world object in varied context (e.g. a shopping centre for ‘cadastre’ is not the same as a shopping centre for the fire department). Moreover, each agency believes that their information is the best and is unwilling to compromise on their perception of what is truth. Incident management is more effective when, instead of working on similar datasets separately, a work is executed on a conflated dataset containing ‘single truth’ while retaining best geometry accuracy, reduce redundancy, reconcile data conflicts and obtain richer attributes.

An example is the Points of Interest (POI) datasets from three Western Australian authorities: Landgate (state land authority), WAPOL (Western Australian Police), and DFES (Department of Fire and Emergency Services). There are many issues that complicate the conflation process and the one related to provenance of the data is the following: each set of POI data was generated from various sources, for example, some of the Landgate POIs were extracted from topographic geodatabases or digitised from orthoimagery while DFES and WAPOL collected geospatial information for many of their POIs from individual company websites, Yellow Pages, and other government resources where available. Thus, for each POI existing in each different dataset, which location is the most authoritative (e.g. fit for use)?

For further reasoning on dataset’s fitness for use, a trust model is executed. For instance a new trust value is generated according to a simple rule: the higher the source on the list of trusted sources, the higher the trust value.

RDA Group: 
Provenance Patterns WG
Contributor: 
Ivana Ivanova
Actors: 
Data conflation web service
Goal: 
• To be able to retrieve source information of the dataset • To validate source information against a rule on trust. • To record the data conflation provenance
Summary: 
This use-case demonstrates the use of provenance information for determining the trust of a dataset resulting from a data conflation.
Preconditions: 
Data is identifiable and discoverable
Provenance is documented (e.g. using ISO 19115) in a machine readable format
Postconditions: 
Results are provided in a machine-readable format.
The conflation process and its provenance is documented in a standard-compliant and machine-readable format.
Steps: 
Start the data conflation service
Visit the data page and filter candidate datasets by the context (e.g. select ‘supermarket POI)
Execute the conflation service and produce new, conflated dataset
Navigate to the ‘source’ information of candidate datasets and access the 'source' value
Execute the 'trust computation model'
Assign ‘trust’ value to the new dataset
Document the provenance of the new, conflated dataset and its ‘trust’ value