Date: Fri, 29 Mar 2024 02:02:38 -0400 (EDT) Message-ID: <1983292573.29818.1711692158799@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_29817_1986964931.1711692158798" ------=_Part_29817_1986964931.1711692158798 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The Division of Sponsored Research (DSR) was= established in 1962 by an act of the Florida Legislature to manage and sti= mulate an expanding and balanced research program. DSR facilitates institut= ional approval for all extramural proposal submissions, accepts and adminis= ters grant awards, and negotiates contracts and other research-related agre= ements on behalf of the University of Florida.
The DSR as a data source provides a database view which is then harveste=
d using the JDBCFetch tool.
The mapping of the provided data to VIVO RDF is specified in an XSLT.
The example XSLT is provided here
Provided with the code is an example script to harvest from DSR.
Contains the license information, authors etc.
#!/bash #Copyright (c) 2010-2011 VIVO Harvester Team. For full list of contributors= , please see the AUTHORS file provided. #All rights reserved. #This program and the accompanying materials are made available under the t= erms of the new BSD license which accompanies this distribution, and is ava= ilable at=20 #http://www.opensource.org/licenses/bsd-license.html #=20 # Contributors: # Christopher Haines, Dale Scheppler, Nicholas Skaggs, Stephen V. Willi= ams, James Pence- initial API and implementation
# set to the directory where the harvester was installed or unpacked # HARVESTER_INSTALL_DIR is set to the location of the installed harvester #=09If the deb file was used to install the harvester then the #=09directory should be set to /usr/share/vivo/harvester which is the #=09current location associated with the deb installation. #=09Since it is also possible the harvester was installed by #=09uncompressing the tar.gz the setting is available to be changed #=09and should agree with the installation location HARVESTER_INSTALL_DIR=3D/usr/share/vivo/harvester export HARVEST_NAME=3Dexample-dsr export DATE=3D`date +%Y-%m-%d'T'%T` # Add harvester binaries to path for execution # The tools within this script refer to binaries supplied within the harves= ter #=09Since they can be located in another directory their path should be #=09included within the classpath and the path environmental variables. export PATH=3D$PATH:$HARVESTER_INSTALL_DIR/bin export CLASSPATH=3D$CLASSPATH:$HARVESTER_INSTALL_DIR/bin/harvester.jar:$HAR= VESTER_INSTALL_DIR/bin/dependency/* export CLASSPATH=3D$CLASSPATH:$HARVESTER_INSTALL_DIR/build/harvester.jar:$H= ARVESTER_INSTALL_DIR/build/dependency/* # Exit on first error # The -e flag prevents the script from continuing even though a tool fails. #=09Continuing after a tool failure is undesirable since the harvested #=09data could be rendered corrupted and incompatible. set -e # Supply the location of the detailed log file which is generated during th= e script. #=09If there is an issue with a harvest, this file proves invaluable in fin= ding #=09a solution to the problem. It has become common practice in addressing = a problem #=09to request this file. The passwords and usernames are filtered out of t= his file #=09To prevent these logs from containing sensitive information. echo "Full Logging in $HARVEST_NAME.$DATE.log" if [ ! -d logs ]; then mkdir logs fi cd logs touch $HARVEST_NAME.$DATE.log ln -sf $HARVEST_NAME.$DATE.log $HARVEST_NAME.latest.log cd .. #clear old data # For a fresh harvest, the removal of the previous information maintains da= ta integrity. #=09If you are continuing a partial run or wish to use the old and already = retrieved #=09data, you will want to comment out this line since it could prevent you= from having # =09the required harvest data. =20 #rm -rf data #if [ -d data]; then #mv -f data data.$DATE #fi
The information is pulled into the system. Since it is a standard databa= se JDBCFetch is used.
# Execute Fetch # This stage of the script is where the information is gathered together in= to one local #=09place to facilitate the further steps of the harvest. The data is store= d locally #=09in a format based off of the source. The format is a form of RDF yet it= s ontology #=09too simple to be put into a model and be useful. # The JDBCFetch tool in particular takes the data from the chosen source de= scribed in its #=09configuration XML file and places it into record set in the flat RDF di= rectly=20 #=09related to the rows, columns and tables described in the target databas= e. harvester-jdbcfetch -X jdbcfetch.config.xml
The XSLT is appli= ed across the data from the fetch.
# Execute Translate # This is the part of the script where the outside data, in its flat RDF fo= rm is used to #=09create the more linked and descriptive form related to the ontological = constructs. #=09The traditional XSL language is used to achieve this part of the work-f= low. harvester-xsltranslator -X xsltranslator.config.xml
The translated data is placed into an RDF database model.
# Execute Transfer to import from record handler into local temp model # From this stage on the script places the data into a Jena model. A model = is a #=09data storage structure similar to a database, but is in RDF. # The harvester tool Transfer is used to move/add/remove/dump data in model= s. # For this call on the transfer tool: # -s refers to the source translated records file, which was just produced = by the translator step # -o refers to the destination model for harvested data # -d means that this call will also produce a text dump file in the specifi= ed location=20 harvester-transfer -s translated-records.config.xml -o harvested-data.model= .xml -d data/harvested-data/imported-records.rdf.xml
The various name spaces determined during the translation are scored aga= inst specific data in the vivo model.
The scoring process results in a model which contains information about = the score results to be used in the matches.
The matching process changes the URI of the matched data to the URI's pr= esent in VIVO.
# Execute Score for Grants # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-grants.config.xml # Execute Score for Sponsor organizations. # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-sponsor.config.xml # Find matches using scores and rename nodes to matching uri # Using the data model created by the score phase, the match process change= s the harvested uris for # =09comparison values above the chosen threshold within the xml configurat= ion file. harvester-match -X match-grants.config.xml # Execute Score for People # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-people.config.xml # Execute Score for Departments # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-dept.config.xml # Execute Score for Primary investigator roles # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-pirole.config.xml # Execute Score for Co-primary investigator roles # In the scoring phase the data in the harvest is compared to the data with= in Vivo and a new model # =09is created with the values / scores of the data comparisons.=20 harvester-score -X score-copirole.config.xml # Find matches using scores and rename nodes to matching uri # Using the data model created by the score phase, the match process change= s the harvested uris for # =09comparison values above the chosen threshold within the xml configurat= ion file. # This config differs from the previous match config, in that it removes ty= pes and literals from the=20 # resources in the incoming model for those that are considered a mat= ch. harvester-match -X match-roles.config.xml
The smush process is used to remove duplicates from the harvested data.<= /p>
# Smush to remove duplicates # Using a particular predicate as an identifying data element the smush too= l will rename those #=09resources which have matching values of that predicate to be one resour= ce. harvester-smush -X smush-grant.config.xml harvester-smush -X smush-org.config.xml harvester-smush -X smush-person.config.xml harvester-smush -X smush-sponsor.config.xml
For those parts which didn't find a match they are given URIs within the= vivo's namespace.
# Execute ChangeNamespace to get unmatched grants into current name-sp= ace # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-grant.config.xml # Execute ChangeNamespace to get unmatched organizations into current name-= space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-org.config.xml # Execute ChangeNamespace to get unmatched sponsoring organizations into cu= rrent name-space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-sponsor.config.xml # Execute ChangeNamespace to get unmatched People into current name-space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-people.config.xml # Execute ChangeNamespace to get unmatched Primary investigator roles into = current name-space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-pirole.config.xml # Execute ChangeNamespace to get unmatched Co-primary investigator roles in= to current name-space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-copirole.config.xml # Execute ChangeNamespace to get unmatched time intervals into current name= -space # This is where the new people from the harvest are given uris within the n= ame-space of Vivo # =09If there is an issue with uris being in another name-space, this is th= e phase #=09which should give some light to the problem. harvester-changenamespace -X changenamespace-timeinterval.config.xml
The Subtraction and Additions are found while comparing to the previous = harvest model.
*Note: The previous model should be equivalent to the actual data in VIV= O. *
If the previously harvested data is edited, then that edit should also b= e applied to the previous model
# Find Subtractions # When making the previous harvest model agree with the current harvest, th= e entries that exist in #=09the previous harvest but not in the current harvest need to be identifi= ed for removal. harvester-diff -X diff-subtractions.config.xml # Find Additions # When making the previous harvest model agree with the current harvest, th= e entries that exist in #=09the current harvest but not in the previous harvest need to be identifi= ed for addition. harvester-diff -X diff-additions.config.xml
The updates are applied to the previous model and then to VIVO. This sho= uld cause the previous model to be equal to the actual data harvested. The = VIVO should also now have the data which is reliant on the harvest changed = to be equal to the new harvest's data.
# Apply Subtractions to Previous model harvester-transfer -o previous-harvest.model.xml -r data/vivo-subtractions.= rdf.xml -m # Apply Additions to Previous model harvester-transfer -o previous-harvest.model.xml -r data/vivo-additions.rdf= .xml # Now that the changes have been applied to the previous harvest and the ha= rvested data in vivo #=09should agree with the previous harvest, the changes are now applied to = the vivo model. # Apply Subtractions to VIVO for pre-1.2 versions harvester-transfer -o vivo.model.xml -r data/vivo-subtractions.rdf.xml -m # Apply Additions to VIVO for pre-1.2 versions harvester-transfer -o vivo.model.xml -r data/vivo-additions.rdf.xml