You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

SHARE Hackathon and Community Meeting

July 11-14, 2016
Charlottesville, VA

Monday, July 16 – Hackathon Day 1

Jeff Spies

SHARE version 2.  More specificity about the contents of the database

Need interfaces for SHARE.  SHARE does not want to be an interface to the scholarly work

Data needs discovery and refinement

Rick Johnson

Exciting time to be involved with SHARE

Erin Braswell

OSF work space.  Code at GitHub.

Provider  -> Harvester -> raw_data -> Normalizer -> normalized_data -> changes -> change_set -> versions -> entities

The Harvester gets the data from the provider. Uses date restrictions to get "new" data. The normalizer creates the values that can go into the SHARE data models.  

Title issues:  Unicode, LateX, MS Word, foreign languages.  Attempt to store the language provided by the provider.  Joined fields for titles with multiple titles.  Can be stored as a a list n the extra class.

Normalizers can guess title or identifier or DOI.  Usually conservative normalizers. 

Idea:  data inspectors:  Write elastic searches to get percentages of populated/vacant fields, by provider, by date range.  Would show the density of field values in the normalized data. Could be used to draw control charts of field values density.  Mirror the values.

Idea:  data inspectors:  Identifiers are a problem, often come in "random". 

Idea:  data inspectors:  feed the results back the the providers.  The providers may be able to suggestions enhancers to the harvesters and normalizers.

Documents can be updated – provider's id.  If the metadata comes in for a record that exists, COS versions the record and provides the most current unless the query asks for versions.

See https://staging-share.osf.io/api/

 

  • No labels