back up to How to plan data ingest for VIVO
previous topic: Ingest tools: home brew or off the shelf?
Note: this is an approach that has been used at Cornell. Other approaches are used at other sites.
- Data in VIVO comes from ingested sources and from manual editing.
- Some VIVO sites do not allow manual editing by users. This can simplify the task.
- A separate VIVO instance is used for ingest.
- This instance is populated from the nightly backup of the production instance.
- The use of a separate VIVO means that the production instance is not loaded down by the ingest process.
- Ingest processes run at night.
- Since ingested data is largely separate from editable data, it is not likely that there would be conflicts, except for the load on the system.
- Ingest processes are run that compare the new data to the data in VIVO.
- They generate RDF triples that must be added or removed from VIVO to represent the new data.
- Because we are not apply these triples immediately, we can inspect them for correctness before committing them.
- The RDF triples are applied to the production VIVO system.
- These processes are ad hoc, and idiosyncratic to Cornell’s data sources and ontology extensions. They are constantly being changed, and are not packaged for release.
next topic: Challenges for data ingest