Page tree
Skip to end of metadata
Go to start of metadata

We had some new voices this time! Good to see.



The chat took a moment to applaud the Repository Support Project's new DSpace Course ( Contributors Stuart Lewis, Chris Yates, and Claudia Jurgen were all present on the chat. The Course is looking for new contributors, particularly with regard to Manakin/XMLUI; if you can help, please contact Stuart Lewis.


Mark Wood pointed out (as have several emails to the list during this week's discussion) that two sharply differing concepts lurk behind the word "statistics": the capture of repository events as they occur, and the distillation of raw event data into useful reports. "Statistics pull patterns out of collections of individual cases," said Mark.
Moreover, not all reports are statistical in nature; some (such as "what's been deposited recently" lists) just regurgitate part of the event stream.

Given accessible event-stream data, many statistical analyses can be done wholly outside of DSpace, and it is unrealistic to expect DSpace to create analyses for every imaginable use-case. Some common use-cases, however, may need to become part of DSpace proper; the trouble is defining them.


All access-related reports (accesses/downloads) should filter out as many crawlers as feasible.

  • item accesses, total as well as by month and year
  • bitstream downloads, as above
  • accesses and downloads by author, as above; authors also want to know what their most popular items are
  • incoming links from other websites (via referrers; note that referrer spam may become a problem)

Other possibilities mentioned included:

  • alerts for download "spikes" over a short period of time
  • on item pages, time of last download
  • "popular items in this repository" (recent, total, and monthly, though it was noted that displaying this information to end-users tends to feed unjust power-law distribution of downloads)

Geolocating accesses was not perceived as vital.


Claudia Jurgen noted that the EU has very strict privacy laws that may prevent collecting or retention of information that may identify individual persons. DSpace may therefore not be able to track individuals' site behavior (to put toward "more like this" links or the like).


Technical issues: The widely-praised Minho stats engine does not yet work with XMLUI, and no one on the chat knew of plans to adapt it. Mark Diggory noted that event-capture should be separated from log4j's error capturing.

Shane Beers pointed out that DSpace does not currently offer repository managers much information about the contents of their repositories, which is a significant worry vis-a-vis bitstream preservation. A list of bitstreams by MIME type would be a start.

DSpace also does not help managers investigate deposit patterns and growth. A readily-accessible list of recent deposits as well as a list of deposits per time period (separable by community/collection, so that different communities can be usefully compared) would be useful to repository administrators, and should be relatively easy to build via (or for research-tracking use-cases, metadata.