Page History

...

Command used:	`[dspace]/bin/dspace stats-util`
Java class:	org.dspace.statistics.util.StatisticsClient
Arguments (short and long forms):	Description
`-b` or `--reindex-bitstreams`	Reindex the bitstreams to ensure we have the bundle name
`-r` or `--remove-deleted-bitstreams`	While indexing the bundle names remove the statistics about deleted bitstreams
`-u` or `--update-spider-files`	Update Spider IP Files from internet into `[dspace]/config/spiders`. Downloads Spider files identified in `dspace.cfg` under property `solr.spiderips.urls`. See Configuration settings for Statistics
`-f` or `--delete-spiders-by-flag`	Delete Spiders in Solr By isBot Flag. Will prune out all records that have `isBot:true`
`-i` or `--delete-spiders-by-ip`	Delete Spiders in Solr By IP Address, DNS name, or Agent name. Will prune out all records that match spider identification patterns.	`-m` or `--mark-spiders`	Update isBot Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched match entries in spiders files
`-h` or `--help`	Calls up this brief help table at command line.

...

The usage of these options is open for the user to choose. If you want to keep spider entries in your repository, you can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg. If you want to keep the spiders out of the solr repository, just use the "-if" option and they will be removed immediately.

...

Technical implementation details

After sharding, the Solr data cores are located in the [dspace.dir]/solr directory. There is no need to define the location of each individual core in solr.xml because they are automatically retrieved at runtime. This retrieval happens in the static method located in the org.dspace.statistics.SolrLogger class. These cores are stored in the statisticYearCores list. Each time a query is made to Solr, these cores are added as shards by the addAdditionalSolrYearCores method. The cores share a common configuration copied from your original statistics core. Therefore, no issues should be resulting from subsequent ant updates.

The actual sharding of the of the original Solr core into individual cores by year is done in the shardSolrIndex method in the org.dspace.statistics.SolrLogger class. The sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query the main Solr data server for all information on each year & download these as CSVs. When we have all data for one year, we upload it to the newly created core of that year by using the update csv handler. Once all data of one year have been uploaded, those data are removed from the main Solr (by doing it this way if our Solr crashes we do not need to start from scratch).

...

Testing Solr Shards

All Versions

DSpace Documentation

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Technical implementation details

Testing Solr Shards