You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

back up to How to plan data ingest for VIVO

previous topic: Ingest tools: home brew or off the shelf?

Note: this is an approach that has been used at Cornell. Other approaches are used at other sites.

  • Data in VIVO comes from ingested sources and from manual editing.
    • Some VIVO sites do not allow manual editing by users. This can simplify the task.
    • A separate VIVO instance is used for ingest.
      • This instance is populated from the nightly backup of the production instance.
      • The use of a separate VIVO means that the production instance is not loaded down by the ingest process.
      • Ingest processes run at night.
        • Since ingested data is largely separate from editable data, it is not likely that there would be conflicts, except for the load on the system.
        • Ingest processes are run that compare the new data to the data in VIVO.
        • They generate RDF triples that must be added or removed from VIVO to represent the new data.
        • Because we are not apply these triples immediately, we can inspect them for correctness before committing them.
        • The RDF triples are applied to the production VIVO system.
        • These processes are ad hoc, and idiosyncratic to Cornell’s data sources and ontology extensions. They are constantly being changed, and are not packaged for release.

 

next topic: Challenges for data ingest

  • No labels