Your first Harvest

JDBCFetch allows you to ingest data from a database into VIVO. Unlike pubmed which is a national data source, the translation file for your ingest will need to be created by you. In addition, although a standard workflow exists in the example-script your ingest may be different. For the example-jdbc all of this work has been done for you. You'll notice that the jdbcfetch.config.xml file contains the select queries to harvest the data from demodb, and the jdbc-to-vivo.datamap.xsl file contains a datamap to map this data into the VIVO ontology. For your own harvest, you'll need to modify both of these files so your own data is harvested. Finally, you'll need to make sure databaseclone.config.xml contains the proper connection parameters to connect to your local database.

So, let's walkthru running the example-jdbc script. First, we'll talk about the data we are harvesting.

The Sample Database

Now, let's setup the harvester to harvest this data.

Harvest Setup

The first run

Three folders will be created

The logs folder contains the log from the run, the data folder contains the data from each run, and the previous-harvest folder contains the old harvest data for use during the update process at the end of the script. While your testing, I would recommend treating each run as the first run (so no update logic will occur). You can do this by removing
the previous-harvest folder before running again.

Inside the data folder, you will find the raw records utilized during the ingest. To see what rdf statements went into VIVO, you can view the vivo-additions.rdf.xml file. Conversely, to view what the harvester removed (because of updated data), you can view the vivo-subtractions.rdf.xml file. This file will be blank on your first run, since you have no previous harvest to compare the incoming data against.

Optimizing

Once your ready to run a large dataset, it is advisable to the record storage from files to a database. Although this will make it harder to find individual records, speed and performance will be increased during the fetch and translate stage. To do so: