Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

These systems of record are often silos used for a defined set of business purposes such as personnel and payroll, grants administration, course registration and management, an institutional repository, news and communications, event calendar(s), or extension.  Even when the same software platform is used, local metadata requirements and functional customizations may make any data source unique.

For this reason, coupled with variations in local technology skills and support environments, the VIVO community has not developed cookie cutter, one-size-fits-all solutions to ingest.

Here's a rough outline of the approach we recommend for people new to VIVO when they start thinking about data:

  1. Look at other VIVOs to see what data other people have loaded and how it appears
  2. Learn the basics about RDF and the Semantic Web (there's an excellent book) – write 10 lines of RDF
  3. Download and install VIVO on a PC or Mac, most easily through the latest VIVO Vagrant instance
  4. Add one person, a position, a department, and some keywords using the VIVO interactive editor
  5. Export the data as RDF and study it – what did you think you were entering, and

...

  1. how different is the structure that VIVO creates?
  2. Add three more people, their positions, and a couple more departments. Try exporting again to get used to the RDF VIVO is creating.
  3. Try typing a small RDF file with a few new records, using the same URI as VIVO created if you are repeating the same entity. Load that RDF into VIVO through the Add/Remove RDF command on the Site Admin menu – does it look right?  If not, double check your typing.
  4. Repeat this with a publication to learn about the Authorship structure – enter by hand, study, hand edit a couple of new records, ingest, test
  5. Don't be afraid to start over with a clean database – you are learning, not going directly to production.
  6. When you start to feel comfortable, think about what data you want to start with – perhaps people and their positions and titles. Don't start with the most challenging and complex data first.
  7. Work with a tool like Karma to learn semantic modeling and produce RDF, however simple, without writing your own code
  8. Load the data from Karma into your VIVO – does it look right?  If not, look at the differences in the RDF – it may be subtle
  9. Then start looking more deeply at the different ingest methods described later in this section of the wiki and elsewhere

Cleaning data prior to loading

If you have experience with data you'll no doubt be familiar with having to clean your data before importing it into software designed to show that data publicly. There will be duplicates, uncontrolled name variants, missing data, mis-spellings, capitalization differences, oddball characters, international differences in names, broken links, and very likely other problems.

VIVO is not smart enough to fix these problems since it in almost all cases has no means of distinguishing an important subtle variation from an error. With the Semantic Web, 'inference' does not refer to the ability to second-guess intent, nor 'reasoning' to invoking artificial intelligence.

Furthermore, it's almost always easier to fix dirty data before it goes into VIVO than to find and fix it after loading

Cleaning data prior to loading

 

Matching against data already in VIVO

...