Page History

Children Display

Introduction

You've looked at VIVO, you've seen VIVO in action at other universities or organizations, you've downloaded and installed the code. What next? How do you get information about your institution into your VIVO?

...

How big is your organization? Some smaller ones have implemented VIVO only through interactive editing – they enter every person, publication, organizational unit, grant, and event they wish to show up, and the keep up with changes "manually" as well. This approach works well for organizations with under 100 people or so, especially if you have staff or student employees who are good at data entry and enjoy learning more about the people and the research. There's something of an inverse correlation with age – students can be blazingly fast with data entry, employing multiple windows and copying and pasting content. The site takes shape before your eyes and it's easy to measure progress and, after a bit of practice, predict how long the process will take.
- This approach may also be a good way to develop a working prototype with local data to use in making your case for a full-scale effort. The process of data entry is tedious but a very good way to learn the structure inherent in VIVO.
- We recommend that people new to RDF and ontologies enter representative sample data by hand and then export it in one of the more readable RDF formats such as n3, n-triples, or turtle. This is an excellent way to compare what you see on the screen with the data VIVO will actually produce – and when you know your target, it's easier to decide how best to develop a more automated ingest process.
The interactive approach, or the manual data entry, will obviously not work with big institutions or where staff time or a ready pool of student editors is not available. There are also many advantages to developing more automated means of ingest and updating, including data consistency and the ability to replace data quickly and on a predictable timetable. Some institutions have opted for utilizing the Karma data integration tool for producing RDF data out of the tabular data that comes from relational databases by modeling it in an interactive environment. Karma data integration tool has one advantage since its interactive visual environment helps in understanding how the ontologies work.
What are your available data sources? Some organizations have made good institutional data a priority, and others struggle with legacy systems lacking consistent identifiers or common definitions for important categorizations such as distinct types of units or employment positions. You . It is very important that data you receive from legacy systems be examined and identifiers and names for people and organizational units are standardised and made consistent across the various systems. Those legacy systems may use different identifiers/codes and names for the same organizational unit and you want to ensure that data is clean before you start modeling it to the ontology. Another aspect of legacy data is that you may have to do make some inquiries to find the right people to contact to find out what data sources are available, and the stakeholders on your VIVO project may need to request access to that data.

...

id	title	publication date	author	publisher	pages
497531	Cartoon Animation	1967	Wilcox, George	HB Press	237
501378	Animation Techniques	1989	Smith, Charlotte and Wilcox, George	Cinema Press	359
391783	Digital Animation	2005	Ivar, Samuel	Digital Logic, Inc.	327
34682	Dairy Barn Automation	2011	Wilcox, G.P.	University of Minnesota Press	403

VIVO stores the book, each author, and the publisher as independent entities related to the other. This enables information about the book, authors, and publisher to be queried and displayed independently, a key feature of the semantic data model.

We have also introduced a common problems with spreadsheets – when a cell contains more than one value. We need a way to connect the book, "Animation Techniques," with two authors, and to indicate that Charlotte Smith is the first author and George Wilcox the second.

This example also points out another challenge in working with data – it's not always clear when values that appear similar actually represent the same entity, whether a person, organization, title, journal, or event. It would be easy to assume the George Wilcox in the first entry is the same as G.P. Wilcox in the 4th, but they are writing about very different topics. For a small organization, it may be easy to disambiguate authors, but this becomes a major challenge at the scale of a major research university.

...

Further topics

Children Display

Under Ingesting and maintaining data
How to plan data ingest for VIVO How to manage data cleanup in VIVO
Ingest tools: home brew or off the shelf?
Typical ingest processes
Challenges for data ingest
Monitoring for quality
Under Maintaining VIVO
VIVO Data Management

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Introduction

Further topics

See also

Under Ingesting and maintaining data
How to plan data ingest for VIVO How to manage data cleanup in VIVO
Ingest tools: home brew or off the shelf?
Typical ingest processes
Challenges for data ingest
Monitoring for quality
Under Maintaining VIVO
VIVO Data Management

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Introduction

Further topics

See also

Under Ingesting and maintaining dataHow to plan data ingest for VIVOHow to manage data cleanup in VIVOIngest tools: home brew or off the shelf?Typical ingest processesChallenges for data ingestMonitoring for qualityUnder Maintaining VIVOVIVO Data Management

Under Ingesting and maintaining data
How to plan data ingest for VIVO How to manage data cleanup in VIVO
Ingest tools: home brew or off the shelf?
Typical ingest processes
Challenges for data ingest
Monitoring for quality
Under Maintaining VIVO
VIVO Data Management