A basic data processing model

Contributor: 
Nicholas Car

Introduction

Many organisations want a template for how to describe, in PROV-O terms, data processing event. They understand the benefits of using a graph-based modelling system like PROV-O but need something simpler than the open-world assumption of OWL to implement, perhaps on top of legacy metadata systems such as catalogues.

Here is a simple PROV-O template that relates the fundamental aspects of a processing event such as its timing, inputs, outputs and causative agents. Use of the template will indicate what items about a data processing event should be recorded and how they should be related, according to PROV-O.

A basic template

Data and other things are used within a processing event (used by an Activity) to produce more data (an Entity). The output Entity wasGeneratedBy the processing Activity and the Activity was conducted by (wasAssociatedWith) an Agent: a person or a system. The figure below shows this basic template.

The process may be something manual ('columnate data in Excel') or something automatic ('extract a geographic subset using a subsetting query') but, regardless, the particulars of it can be described in a Plan input to the Activity, see Pattern 12. It may be important to record additional configuration required to usefully describe the situation in which the processing event operated. Such config should be recorded in additional Entities used by the Activity.

Often data processing is conducted by a machine, perhaps a supercomputer, as directed by a person. This is indicated by the Agent running the process (wasAssocicatedWith it) having acetedOnBehalfOf another Agent.

SVG Image: