Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Command used:

[dspace]/bin/dspace stats-log-converter

Java class:

org.dspace.statistics.util.ClassicDSpaceLogConverter

Arguments short and long forms):

Description

-i or --in

Input file

-o or --out

Output file

-m or --multiple

Adds a wildcard at the end of input and output, so it would mean if -i dspace.log -m was specified, dspace.log* would be converted. (For example, the following files would be included because of this argumenti.e. all of the following: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)

-n or --newformat

If the log files have been created with DSpace 1.6 or newer

-v or --verbose

Display verbose output (helpful for debugging)

-h or --help

Help

The command loads the intermediate log files that have been created by the aforementioned script into SOLRSolr.

Command used:

[dspace]/bin/dspace stats-log-importer

Java class:

org.dspace.statistics.util.StatisticsImporter

Arguments (short and long forms):

Description

-i or --in

input file

-m or --multiple

Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported

-s or --skipdns

To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)

-v or --verbose

Display verbose ouput (helpful for debugging)

-l or --local

For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.

-h or --help

Help

...

Technical implementation details

After sharding, the SOLR data cores are located in the [dspace.dir]/solr directory. There is no need to define the location of each individual core in solr.xml because they are automatically retrieved at runtime. This retrieval happens in the static method located in theorg.dspace.statistics.SolrLogger class. These cores are stored in the statisticYearCores list each time a query is made to the solr these cores are added as shards by the addAdditionalSolrYearCores method. The cores share a common configuration copied from your original statistics core. Therefore, no issues should be resulting from subsequent ant updates.

The actual sharding of the of the original solr core into individual cores by year is done in the shardSolrIndex method in the org.dspace.statistics.SolrLogger class. The sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query the main solr data server for all information on each year & download these as csv's. When we have all data for one year we upload it to the newly created core of that year by using the update csvhandler. One all data of one year has been uploaded that data is removed from the main solr (by doing it this way if our solr crashes we do not need to start from scratch).