Typical ingest processes

Created by Jon Corson-Rikert, last modified on Jan 27, 2013

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

back up to How to plan data ingest for VIVO

previous topic: Ingest tools: home brew or off the shelf?

Note: this is an approach that has been used at Cornell. Other approaches are used at other sites.

Data in VIVO comes from ingested sources and from manual editing.
- Some VIVO sites do not allow manual editing by users. This can simplify the task.
- A separate VIVO instance is used for ingest.
  - This instance is populated from the nightly backup of the production instance.
  - The use of a separate VIVO means that the production instance is not loaded down by the ingest process.
  - Ingest processes run at night.
    - Since ingested data is largely separate from editable data, it is not likely that there would be conflicts, except for the load on the system.
    - Ingest processes are run that compare the new data to the data in VIVO.
    - They generate RDF triples that must be added or removed from VIVO to represent the new data.
    - Because we are not apply these triples immediately, we can inspect them for correctness before committing them.
    - The RDF triples are applied to the production VIVO system.
    - These processes are ad hoc, and idiosyncratic to Cornell’s data sources and ontology extensions. They are constantly being changed, and are not packaged for release.

next topic: Challenges for data ingest

No labels

All content on the LYRASIS Wiki is licensed under the CC BY (Attribution) license, unless otherwise noted.