Major options including the Harvester, semantic ingest tools such as Karma, and XSLT

 

Major options

Data ingest for VIVO is a process of transforming an existing or new source of data into RDF and loading that RDF into VIVO's data store, called a triple store after the three-part data statements it contains.  VIVO ships configured to use an open source triple store called Jena SDB implemented using one of several off-the-shelf databases – in most cases MySQL, although at least one VIVO site, Melbourne's Find an Expert site, is exploring IBM triple store technology implemented via Oracle. In addition, developers at Cornell and Florida are leveraging the new RDF API implementation in VIVO 1.5 to experiment with other triple stores including SesameVirtuoso, and Owlim.

Imagine first that you have a magic black box that converts each data source on your list into RDF compatible with the VIVO ontology.  Loading that RDF into VIVO can be accomplished as simply as logging into VIVO as a site administrator and loading the RDF via the Add/Remove RDF Data command in the Advanced Data Tools section of the site admin menu.

Caveats

Unfortunately it's rarely quite that simple.

Okay, you've heard that before

You likely have some experience with ETL (extract, transform, and load) processes and you've heard about these problems before. This is good – you are aware that while VIVO is the challenge you are taking on now, getting data into VIVO is not that different from other platforms.

Unmasking the black box

You have at least three choices:

The VIVO Harvester

The VIVO harvester can be configured for a wide variety of tasks.

The Harvester has been extensively documented throughout it's lifetime by its original developers at the University of Florida and through the work of other VIVO developers and implementers at other institutions. Please see the Ingesting and maintaining data section for full details.

Working with semantic rather than scripting tools

Ideally each significant source of data at an implementing institution will first be represented by its own local ontology that represents the data source as it is made available to the project. By reflecting the data as it comes to you in an ontology, you are in a better position to detect changes (either additions or deletions) in the source over time and can reduce or transform the data transferred to your local VIVO instance.

One of the strengths of semantic approach is that by creating mappings from that source ontology to the VIVO ontology, much of the work of processing the data is not only clearer but can then be accomplished without writing programming scripts or Java code. A programming approach might come more naturally to you at first, but may prove be more work and less transparent to maintain. Limiting yourself to simple data formats such as spreadsheets or .csv files can be the equivalent of using a very small pipe to connect the semantically-rich data in your source with the semantically rich data in VIVO.

Working in the RDF world as early as possible in the data ingest process will also train you for using tools available for querying data in VIVO itself (e.g., using SPARQL to run reports), making VIVO data available as web services for consumption on other websites, or for mapping data exported from VIVO into other tools such as the Digital Vita tool developed at Pittburgh.

The logic and application of semantic mappings are discussed extensively in the recommended book, "Semantic Web for the Working Ontologist", including many short examples and a step-by-step introduction of RDF and OWL capabilities.


next topic: Typical ingest processes