February 24 LD4L Workshop breakout session: Usage Data
facilitator: Paul Deschner
Usage data sources
OCR-ed bibliographies and page rank
ILL usage
Yahoo circ logs
Web analytics (e.g., DPLA UI analytics, esp. contextual granularity)
Search terms as form of usage; also as compared to other usage data
Entities extracted from queries, not simply literal queries themselves
How often a link is traversed; how many times your link has been reconciled in triple store
Browsed materials
Citations; also citation networks as compared to other usage data
Course-book lists across institutions
StackScore
Makes data muddy
Too many metrics mixed together; need to separate out the metrics
Common metrics needed across institutions
Computational transparency important: metrics and algorithms
Negative usage data at local institution
Important to see what users are looking for but local institution doesn’t have
What doesn’t circulate in-house but is available via ILL
What isn’t read at Columbia but at Yale
Usage data runs risk of becoming prescriptive
Blandness of collections when everyone acquires most popular items
Use cases
Keeping tabs on popularity of colleagues’ publications
Usage data as diagnostic tool for targeted collections: highly invested-in parts of collection not being used could drive arranging an exhibition to increase awareness
Scholars doing research on other scholars research and publications
Look at when items were used: what was checked out in last week, month, year, etc.
Link traversals and other link metrics could be sent to link’s source
Long tail issue generally and at own institution
Options: random selection out of tail for exposure, subject-filtered selection
Important that UI expose long-tail possibilities prominently, above the page-fold
Usage data from other institutions and ILL balances out local-institution’s biases
Privacy
Opt-in option for users willing to share their usage data
Huddersfield University (England): more liberal approach to data exposure, including access to clustering (users who borrowed this also borrowed that) and usage by academic course and school
IP-based web stats inherently less risky than personal ID-based circulation data
Anonymization tools important
- Clustering dangerous