Overview

Diff takes two data models (called a "minuend" model and a "subtrahend" model), compares them, and returns all triples in the minuend model that are not in the subtrahend model.

For example, a Diff between model M1 and M2, where M1 contains records (A, B, C, D, E) and M2 contains (C, D, E, F, G) would return the set (A, B). Reversing the arguments would return (F, G).

Usage

This is commonly used in Harvester scripts immediately prior to entering newly harvested data into VIVO. When the results of the previous run of the harvest are used as the minuend, and the new input is the subtrahend, then the output of Diff is items which are already in VIVO due to the previous harvest but which are not present in the new harvest. These are assumed to have been removed and/or replaced, and so this result (called the "subtraction file") is used to remove data from VIVO. When the arguments are reversed, the result is items in the new harvest which were never previously harvested. This result (called the "addition file") is used to add data to VIVO.

The subtraction file and the addition file are then applied to VIVO (the -m switch for Transfer is used for the subtractions), and also to the "previous harvest model" so that it stays up-to-date allowing Diff to work properly for future runs.

Data in the intersection between the two sets are not output in either execution of Diff, since because they are identical both in the input and in VIVO, it would be unnecessary to handle them. In this way Diff is used to prevent unnecessary overhead on the production VIVO database.

Selective Diff

When diffing to create subtraction models, some processes may wish to preserve entities in the minuend (previous harvest) which are entirely not present in the subtrahend (new input). The selective diff mode generates a diff model as per the normal process, but prunes the resulting model such that core elements completely absent from the subtrahend are not included. This allows for changes to properties to pass through, but prevents the removal of large swaths of data in the event that the minuend contains large numbers of entries, but the subtrahend only includes updates to several.

Diff.java

Parameters

Short Option	Long Option	Parameter Value Map	Description	Required
m	minuend	CONFIG_FILE	config file for input jena model	true
M	minuendOverride	override the JENA_PARAM of input jena model config using VALUE	false
s	subtrahend	CONFIG_FILE	config file for input jena model	true
S	subtrahendOverride	override the JENA_PARAM of input jena model config using VALUE	false
o	output	CONFIG_FILE	config file for output jena model	true
O	outputOverride	override the JENA_PARAM of output jena model config using VALUE	false
d	dumptoFile	FILENAME	filename for output	true
e	selective-diff		use selective diff	false
U	update-types	type to be updated with selective diff	false

Properties

Name	Type	Visibility	Description
log	Logger	private	SLF4J Logger statically set at the top of the file
minuendJC	JenaConnect	private
subtrahendJC	JenaConnect	private
output	JenaConnect	private
dumpFile	String	private	String of the file name as set by the uesr
updateTypes	List<String>	private	List of types to be updated in selectiveDiff
bUsingSelectiveDiff	boolean	private	Whether or not we are using selectiveDiff. Implicitly true if updateTypes specified.
bHasUpdateTypes	boolean	private	Whether or not types are specified for use by selectiveDiff

Execution

-m config/jenaModels/h2.xml -M dbUrl="jdbc:h2:XMLVault/h2Pubmed/score/store" -M modelName=pubmedScore -s config/jenaModels/VIVO.xml -S modelName="http://vivoweb.org/ingest/pubmed" -d XMLVault/update_Additions.rdf.xml

Methods

getParser

parse the arguments from the parameter list above

diff

create a diff model
get the minuend and subtrahends models
run jena's .difference method minuend.difference(subtrahend)
check for null dumpfile
1. write out if not null to file
check for null output
1. write out if not null to output jenaconnect

selectiveDiff

create a diff model
get the minuend and subtrahends models
run jena's .difference method minuend.difference(subtrahend)
load subtrahend and diff models into temp memJena for multi-graph SPARQL query.
if updateTypes are specified:
1. for each type: create a difference model fragment containing to changes to properties of the type, preserving root node.
2. append fragment to new diff model.
if types are not specified:
1. create a new difference model containing changes to properties of all types, preserving all root nodes.
check for null dumpfile
1. write out if not null to file
check for null output
1. write out if not null to output jenaconnect

buildPreservationQuery

build and return SPARQL query string, taking an rdf:type from the updateTypes list.

execute

call diff sending it the minuend, the subtrahend, the outputs

main

Start Logger
Run diff.execute passing the args[] to the constructor
catch errors
1. IllegalArgumentException
2. IOException
3. Exception

traceModel

build a ResultSet from an ?s?p?o query on the model.
for each query solution, trace solution.toString() to the logger.

Example

Update

How Diff is used to do "graph math" updating

V = vivo model

H = new harvested model

P = previous harvest model | A = additions

S = subtractions | |

A = Diff (H → P)	The additions are the triples that are in the new harvest when the old harvest triples are removed.
S = Diff (P → H)	The subtractions are the triples that are in the old harvest when the new harvest is removed.
P = V – S

P = V + A | The application of these to the old harvest will make the old harvest equal to the new harvest. |

V = V – S

V = V + A | The application of these to the vivo model will make the information in the vivo model agree with the harvest without tampering with the data. |

Space shortcuts

Page tree