Documenting provenance in an automated federation of spatial information

Many applications require distributed authoritative spatial data to be joined, either because activities occur at borders, or a uniform view of data is required. The required syntactic and semantic harmonization can be done in various locations in an information architecture. Due to the jurisdictions having different spatial data schemas and formats, database interoperability is an issue. By finding ways to federate spatial data automatically using semantic web techniques, it would allow the unification of all the disparate datasets, providing easier access to the dataset while removing the semantic burden from the user of these dataset.

An example is the federated flood management dataset: Land information management organization is investigating how they can collect and access flood related information, stored and managed by regional councils. To achieve this, the aim is to access relevant spatial data through online (web services) without the need to copy, transform or process the data into a particular organizational schema. Documenting provenance of this process is paramount for a reliable and reusable result.

RDA Group: 
Provenance Patterns WG
Ivana Ivanova
Data federation web service
• To be able to automatically record prospective provenance of a dataset
This use-case describes how to record provenance in a data federation service.
Data is identifiable and discoverable
Provenance is documented (e.g. using ISO 19115) in a machine readable format
Results are provided in a machine-readable format.
The federation process and its provenance is documented in a standard-compliant and machine-readable format.
Authenticated users search for an address via a geocoder or simply navigate on the map and select
The application pre-process selected location and send it to the WFS federation service endpoint
The federation server will transform the WFS into a SPARQL query on a federated ontology.
The ontology will source its data from both local and remote data sources.
The response is are collated and formatted before sending it back to the user.
The provenance of the whole process is tracked and registered
The result is displayed on the map along with the result's provenance report.