Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Title issues:  Unicode, LateX, MS Word, foreign languages.  Attempt to store the language provided by the provider.  Joined fields for titles with multiple titles.  Can be stored as a a list n in the extra class.

Normalizers can guess title or identifier or DOI.  Usually conservative normalizers. 

MC Idea:  data inspectors:  Write elastic searches to get percentages of populated/vacant fields, by provider, by date range.  Would show the density of field values in the normalized data. Could be used to draw control charts of field values density.  Mirror the values.

MC Idea:  data inspectors:  Identifiers are a problem, often come in "random". 

MC Idea:  data inspectors:  feed the results back the the providers.  The providers may be able to suggestions enhancers to the harvesters and normalizers.

...

Tuesday – Hackathon Day 2

Wrote Studied the SHARE API.  Learned a bit about Elastic Search.  Investigated sharepa (but it was not ready to be used for version 2.  By the end of the community meeting Erin had upgraded it to work with version 2.  Wrote the share-data-inspector. Upload to GithHiub and provide link here

...

Internet of Things – forget it.  Seems helpful.  The important thing is the monitor monitoring and managing of people.  Us.  Companies must have a lot of knowledge about us.  Difficult to enter the market – these companies have 18 years of data on us.

...

https://osf.io/share  117 providers, 7 million records (but how many unique – some feeds a re completely duplicative – Dryad and DataCite, for example).  Clinical Trials.gov  Zenodo, PLoS, Arxiv.org, Figshare, and 50 instititional providers.

...

http://osf.io/share/atom/?q="maryann martone"OR"maryann e martone"

http://osf.io/share/atom/?q="m conlon"OR"Michael michael Conlon"OR"Mike Conlon"

http://Blogtrottr.com  for sending a feed digest to a mail address on a regular schedule.

...

Lisa Johnson, University of Minnesota.  Data Curation Network.  Rise of Data Sharing Culture. Role of librarians – discipline specific expertise, technology expertise. Data  Data curation network:  Minn, Cornell, PennState, Illinois, Michigan, WUSTL.  Collecting and reporting data curation experiences, metrics for results.  http://sites.google.com/DataCurationNetwork

Anita de Waard, Elsevier

Hackathon Report back

Institutional Dashboard

...

Does DataCite totally duplicate Dryad for data set consumption?  Metadata might be different.  Similar questions applyu apply to other overlapping services – Dataverse and DataCite.

Alexander Garcia Castro VIVO and SHARE

...

SHARE is chaotic and promiscuous.  VIVO is chaste, great precision.

Research Hub concept to engage faculty.  Claiming, feedback to metadata providers.

SHARE Scopus Mendeley GitHub

...

Search -> Claim -> Add -> Connect Research Objects -> Social Connections -> Done

VIVO needs and an engagement strategy.  Beautiful, clear models, open, reusable semantic data.

ORCiD has minimal faculty engagement.  Sign up, disambiguate, No reason for faculty member to go back to ORCiD after initial set-up.  Identifier only.  OpenVIVO could be, should be, more engaging.

SHARE is big, but messy.  Also needs an engagement strategy.

...

OSF as a platform for scholarly workflow.  Slides available here: http://osf.io/9kcd3

MC: OSF Needs:

  1. IdentityIdentity – benefit for faculty: collaborators around the world
  2. Extensible/local workflowworkflow – benefit for faculty: reduce regulatory/administrative burden
  3. Github issues – benefit for faculty: improve ability to manage team/projectGithub issues

Prue Adler, Brandon Butler, Metadata Copyright Legal Guidance

...

The SHARE information environment is maturing rapidly.  They have a large staff of talented young developers augmented by a very large group of talented interns from UVA.  They have a tremendous amount of data, a growing set of providers, sophisticated version control, and a change set architecture for all their data which allows them to reharvest data to improve the quality of the current version.  The API is easy to use.  ElasticSearch works very well.  They recognize the need to disambiguate their data and are planning heuristics, use of identifiers, and use of curation associations at libraries to merge entities.

...