Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Title issues:  Unicode, LateX, MS Word, foreign languages.  Attempt to store the language provided by the provider.  Joined fields for titles with multiple titles.  Can be stored as a a list n in the extra class.

Normalizers can guess title or identifier or DOI.  Usually conservative normalizers. 

MC Idea:  data inspectors:  Write elastic searches to get percentages of populated/vacant fields, by provider, by date range.  Would show the density of field values in the normalized data. Could be used to draw control charts of field values density.  Mirror the values.

MC Idea:  data inspectors:  Identifiers are a problem, often come in "random". 

MC Idea:  data inspectors:  feed the results back the the providers.  The providers may be able to suggestions enhancers to the harvesters and normalizers.

...

Tuesday – Hackathon Day 2

Wrote Studied the SHARE API.  Learned a bit about Elastic Search.  Investigated sharepa (but it was not ready to be used for version 2.  By the end of the community meeting Erin had upgraded it to work with version 2.  Wrote the share-data-inspector.  Upload to GithHiub and provide link hereSee https://github.com/mconlon17/share-data-inspector

Wednesday – Community Meeting Day 1

...

Internet of Things – forget it.  Seems helpful.  The important thing is the monitor monitoring and managing of people.  Us.  Companies must have a lot of knowledge about us.  Difficult to enter the market – these companies have 18 years of data on us.

...

https://osf.io/share  117 providers, 7 million records (but how many unique – some feeds a re completely duplicative – Dryad and DataCite, for example).  Clinical Trials.gov  Zenodo, PLoS, Arxiv.org, Figshare, and 50 instititional providers.

...

http://osf.io/share/atom/?q="maryann martone"OR"maryann e martone"

http://osf.io/share/atom/?q="m conlon"OR"Michael michael Conlon"OR"Mike Conlon"

http://Blogtrottr.com  for sending a feed digest to a mail address on a regular schedule.

...

Lisa Johnson, University of Minnesota.  Data Curation Network.  Rise of Data Sharing Culture. Role of librarians – discipline specific expertise, technology expertise. Data  Data curation network:  Minn, Cornell, PennState, Illinois, Michigan, WUSTL.  Collecting and reporting data curation experiences, metrics for results.  http://sites.google.com/DataCurationNetworkAnita de Waard, Elsevier

Hackathon Report back

Institutional Dashboard

...

Does DataCite totally duplicate Dryad for data set consumption?  Metadata might be different.  Similar questions applyu apply to other overlapping services – Dataverse and DataCite.

Alexander Garcia Castro VIVO and SHARE

...

SHARE is chaotic and promiscuous.  VIVO is chaste, great precision.

Research Hub concept to engage faculty.  Claiming, feedback to metadata providers.

SHARE Scopus Mendeley GitHub

...

Search -> Claim -> Add -> Connect Research Objects -> Social Connections -> Done

VIVO needs and an engagement strategy.  Beautiful, clear models, open, reusable semantic data.

ORCiD has minimal faculty engagement.  Sign up, disambiguate, No reason for faculty member to go back to ORCiD after initial set-up.  Identifier only.  OpenVIVO could be, should be, more engaging.

SHARE is big, but messy.  Also needs an engagement strategy.

...

OSF as a platform for scholarly workflow.  Slides available here: http://osf.io/9kcd3

MC: OSF Needs:

  1. IdentityIdentity – benefit for faculty: collaborators around the world
  2. Extensible/local workflowworkflow – benefit for faculty: reduce regulatory/administrative burden
  3. Github issues – benefit for faculty: improve ability to manage team/projectGithub issues

Prue Adler, Brandon Butler, Metadata Copyright Legal Guidance

...

Two modes of research:  exploratory, confirmation

Preregistration challenge:  http://cos.io/prereg

Registered reports:  Design -> Peer Review -> Collect and Analyze -> Report -> Publish  Aligns all incentives

Tyler Walters – Closing Comments

VIVO is a fundamental part of the university's infrastructure

Some Observations

The SHARE information environment is maturing rapidly.  They have a large staff of talented young developers augmented by a very large group of talented interns from UVA.  They have a tremendous amount of data, a growing set of providers, sophisticated version control, and a change set architecture for all their data which allows them to reharvest data to improve the quality of the current version.  The API is easy to use.  ElasticSearch works very well.  They recognize the need to disambiguate their data and are planning heuristics, use of identifiers, and use of curation associations at libraries to merge entities.

Their data model is shallow, and driven by the fields that are commonly available from the providers.  VIVO is very interesting to SHARE as it provides missing semantics and a very deep data model. The two efforts – VIVO and SHARE – are strikingly complimentary.  VIVO would benefit tremendously from "world" metadata collected and curated by SHARE.  SHARE qould benefit tremendously from VIVO's institutional footprint and the potential to feedback metadata improvements to SHARE.

Additional Conversations

I spoke with numerous COS staffers regarding data specification, semantics, use of identifiers, and the "golden query": "Return all the metadata for institution x"  Given a reasonable result for this query, institutions could contribute to metadata improvement and use SHARE metadata for VIVO.

Next Steps

There is great interest in creating a "virtual cycle" of data movement between VIVO and SHARE.  The VIVO existing VIVO Harvester for SHARE will be upgraded by SHARE to their version 2 data model.  VIVO will enable harvesting of OpenVIVO for SHARE.  SHARE2VIVO will be upgraded to allow users of OpenVIVO to update their profiles from SHARE data – this will require additional specificity in the SHARE data.

Tyler Walters – Closing Comments

VIVO is a fundamental part of the university's infrastructure

Some Observations

The SHARE information environment is maturing rapidly.  They have a large staff of talented young developers augmented by a very large group of talented interns from UVA.  They have a sophisticated version control, change set architecture for all their data which allows them to reharvest data to improve the quality of the current version.  The API is easy to use.  ElasticSearch works very well.  They recognize the need to disambiguate their data and are planning heuristics, use of identifiers, and use of curation associations at libraries to merge entities.

Their data model is shallow, and driven by the fields that are commonly available from the providers.  VIVO is very interesting to SHARE as it provides missing semantics and a very deep data model. The two efforts – VIVO and SHARE – are strikingly complimentary.  VIVO would benefit tremendously from "world" metadata collected and curated by SHARE.  SHARE qould benefit tremendously from VIVO's institutional footprint and the potential to feedback metadata improvements to SHARE.

Additional Conversations

Numerous COS staff regarding data specification, semantics, use of identifiers, and the "golden query": "Return all the metadata for institution x"  Given a reasonable result for this query, institutions could contribute to metadata improvement and use SHARE metadata for VIVO.

Next Steps

There is great interest in creating a "virtual cycle" of data movement between VIVO and SHARE.  The VIVO existing VIVO Harvester for SHARE will be upgraded by SHARE to their version 2 data model.  VIVO will enable harvesting of OpenVIVO for SHARE.  SHARE2VIVO will be upgraded to allow users of OpenVIVO to update their profiles from SHARE data – this will require additional specificity in the SHARE data.