All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
DSpace 1.6 and newer versions uses the Apache Solr SOLR application underlaying underlying the statistics. There is no need to download any separate softwareSOLR enables performant searching and adding to vast amounts of (usage) data.
Unlike previous versions, enabling statistics in DSpace does not require additional installation or customization. All the necessary software is included.
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Each time a page or file gets requested, this request is being logged. The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide usage data.
Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.xml.
The fields, stored in a usage event by default are:
Code Block |
---|
<field name="type" type="integer" indexed="true" stored="true" required="true" />
<field name="id" type="integer" indexed="true" stored="true" required="true" />
<field name="ip" type="string" indexed="true" stored="true" required="false" />
<field name="time" type="date" indexed="true" stored="true" required="true" />
<field name="epersonid" type="integer" indexed="true" stored="true" required="false" />
<field name="continent" type="string" indexed="true" stored="true" required="false"/>
<field name="country" type="string" indexed="true" stored="true" required="false"/>
<field name="countryCode" type="string" indexed="true" stored="true" required="false"/>
<field name="city" type="string" indexed="true" stored="true" required="false"/>
<field name="longitude" type="float" indexed="true" stored="true" required="false"/>
<field name="latitude" type="float" indexed="true" stored="true" required="false"/>
<field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningColl" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningItem" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="dns" type="string" indexed="true" stored="true" required="false"/>
<field name="userAgent" type="string" indexed="true" stored="true" required="false"/>
<field name="isBot" type="boolean" indexed="true" stored="true" required="false"/>
|
The combination of type and id determine which resource (either community, collection, item page or file download) has been requested.
In the XMLUI, statistics can be accessed from the lower end of the navigation menu. In the JSPUI, a view statistics button appears on the bottom of pages for which statistics are available.
If you are not seeing these links or buttons, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter "statistics.item.authorization.admin" to false in order to make statistics visible for all repository visitors.
Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository.
The following statistics are available for the community home pages:
The following statistics are available for the collection home pages:
The following statistics are available for the item home pages:
The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the JSPUI and XMLUI user interface applications of DSpace. Solr runs as a separate webapplication and an instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instance. The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the JSPUI and XMLUI user interface applications of DSpace. Solr runs as a separate webapplication and an instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instance. The Usage Event framework has a couple EventListeners installed which assist in \[rest of sentence missing? - Kim\] Wiki Markup
In the dspace.cfg file review the following fields to make sure they are uncommented:
Property | Name Default Value | Type | Description | : | solr.log.server | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Example Value: | solr.log.server ${dspace.baseUrl}= http://127.0.0.1/solr/statistics | ||||||||||
String Informational Note: | Is used by the SolrLogger Client class to connect tot to the Solr server over http and perform updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1).
Assuming you get an HTTP 200 OK response, then you should set | ||||||||||
Property: | solr.spiderips.urls | ||||||||||
Example Value: <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="99bd7967-a076-42e4-ab4f-0f853872b80c"><ac:plain-text-body><![CDATA[ | solr.spiderips.urls =
][
]]></ac:plain-text-body></ac:structured-macro> <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="e1538fec-0817-4348-ba9d-e3684e95e8d2"><ac:plain-text-body><![CDATA[][
]]></ac:plain-text-body></ac:structured-macro> <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="97ccf307-704a-48ba-99b6-6bedd854fde8"><ac:plain-text-body><![CDATA[][
| ||||||||||
String Informational Note: | List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of known spider IPs and are utilized by the SolrLogger to flag usage events with an "isBot" field, or ignore them entirely.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c0a58cc9-3345-4a0b-a608-17608a55ac11"><ac:plain-text-body><![CDATA[from your [dspace]/bin directory | ]]></ac:plain-text-body></ac:structured-macro> | |||||||||
Property: | solr.dbfile | ||||||||||
Example Value: <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="02796653-df59-4202-892b-db50008d98db"><ac:plain-text-body><![CDATA[ | solr.dbfile | ||||||||||
String Informational Note: | The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate the location of client requests based on IP address. During the Ant build process (both fresh_install and update) this file will be downloaded from [http://www.maxmind.com/app/geolitecity] if a new version has been published or it is absent from your [dspace]/config directory. | ||||||||||
Property: | solr.resolver.timeout | ||||||||||
Example Value: ]]></ac:plain-text-body></ac:structured-macro> | solr.resolver.timeout = 200 | ||||||||||
Integer Informational Note: | Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool. | ||||||||||
Property: | useProxies | ||||||||||
Example Value: | useProxies = true | ||||||||||
Informational Note: | <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="08a14bfb-9aaa-4fe7-b92b-f3efc519a692"><ac:plain-text-body><![CDATA[ | useProxies | true | boolean | Will cause Statistics loging logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache mod_proxy). Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging sesction section of dspace.cfg] | ||||||
Property: | statistics.item.authorization.admin | ||||||||||
Example Value: ]]></ac:plain-text-body></ac:structured-macro> | statistics.item.authorization.admin = true | ||||||||||
Informational Note: | When set to true, only general administrators, collection and community administrators are able to access the statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making | boolean | Enables access control restriction on DSpace Statistics pages, Restrictions are based on access rights to Community, Collection and Item Pages. This will require the user to sign on to see that statistics. Setting the statistics to "false" will make them publicly available. | ||||||||
Property: | solr.statistics.logBots | ||||||||||
Example Value: | solr.statistics.logBots = true | ||||||||||
Informational Note: | When this property is set to | boolean | If false, and IP is detected as a spider, the event is not logged. | ||||||||
Property: | solr.statistics.query.filter.spiderIp | ||||||||||
Example Value: | solr.statistics.query.filter.spiderIp = false | ||||||||||
boolean Informational Note: | If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings. | ||||||||||
Property: | solr.statistics.query.filter.isBot | ||||||||||
Example Value: | solr.statistics.query.filter.isBot = true | ||||||||||
boolean Informational Note: | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. |
...
Code Block |
---|
cd [dspace-source]/dspace mvn package cd [dspace-source]/dspace/target/dspace-<version>-build.dir ant -Dconfig=[dspace]/config/dspace.cfg update cp -R [dspace]/webapps/* [TOMCAT]/webapps |
The last step is only used if you are not mounting _\[~mdiggory:dspace\]/webapps_ directly into your Tomcat, Resin or Jetty host (the recommended practice)If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with: Wiki Markup
Code Block |
---|
cp -R dspace/webapps/solr TOMCAT/webapps |
...
_Again, only if you are not mounting \ [~mdiggory:dspace\]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice)_
Restart your webapps (Tomcat/Jetty/Resin)
...
The Are the following Dspace.cfg fields still used by the new 1.6 Statistics? If not, we need to either document this well or remove them altogether:are only applicable to the older statistics solution.
Code Block |
---|
###### Statistical Report Configuration Settings ###### # should the stats be publicly available? should be set to false if you only # want administrators to access the stats, or you do not intend to generate # any report.public = false # directory where live reports are stored report.dir = ${dspace.dir}/reports/ |
These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases
If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade.
The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks.
If required, the solr server can be optimized by running
Code Block |
---|
{dspace.dir}/bin/stats-util -o
|
. More information on how these solr server optimizations work can be found here: http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations.
In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace installations, this would result in a huge load of small solr commits resulting in a very high load on the solr server.
This has been resolved in dspace 1.7 by only committing usage events to the solr server every 15 minutes. This will result in a delay of the storage of a usage event of maximum 15 minutes. If required, this value can be altered by changing the maxTime property in the
Code Block |
---|
{dspace.dir}/solr/statistics/conf/solrconfig.xml.
|
When the web user interface does not offer you the statistics you need, you can greatly expand the reports by querying the SOLR index directly.
Query:
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
|
Explained:
facet.field=epersonid — You want to group by epersonid, which is the user id.
type:0 — Interested in bitstreams only
Code Block |
---|
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="epersonid">
<int name="66">1167</int>
<int name="117">251</int>
<int name="52">42</int>
<int name="19">36</int>
<int name="88">20</int>
<int name="112">18</int>
<int name="110">9</int>
<int name="96">0</int>
</lst>
</lst>
</lst>
|