Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Command used:

[dspace]/bin/dspace solr-export-statistics

Java class:

org.dspace.util.SolrImportExport

Arguments (short and long forms):

Description

- i or - -index-name

optional, the name of the index to process. "statistics" is the default

-l or --last integer

optionally export only integer many days worth of statistics
-d or --directoryoptional, directory to use for storing the exported files. By default, [dspace]/solr-export is used. If that is not appropriate (due to storage concerns), we recommend you use this option to specify a more appropriate location.

- f or - -force-overwrite

optional, overwrite export file if it exists (DSpace 6.1 and later)

Import SOLR statistics, for restoring lost data or moving to another server

Command used:

[dspace]/bin/dspace solr-import-statistics

Java class:

org.dspace.util.SolrImportExport

Arguments (short and long forms):

Description

- i or - -index-name

optional, the name of the index to process. "statistics" is the default

-c or --clear

optional, clears the contents of the existing stats core before importing
-d or --directoryoptional, directory which contains the files for importing. By default, [dspace]/solr-export is used. If that is not appropriate (due to storage concerns), we recommend you use this option to specify a more appropriate location.

Reindex SOLR statistics, for upgrades or whenever the Solr schema for statistics is changed

Command used:

[dspace]/bin/dspace solr-reindex-statistics

Java class:

org.dspace.util.SolrImportExport

Arguments (short and long forms):

Description

- i or - -index-name

optional, the name of the index to process. "statistics" is the default

-k or --keep

optional, tells the script to keep the intermediate export files for possible later use (by default all exported files are removed at the end of the reindex process).
-d or --directoryoptional, directory to use for storing the exported files (temporarily, unless you also specify --keep, see above). By default, [dspace]/solr-export is used. If that is not appropriate (due to storage concerns), we recommend you use this option to specify a more appropriate location. Not sure about your space requirements? You can estimate the space required by looking at the current size of [dspace]/solr/statistics

- f or - -force-overwrite

optional, overwrite export file if it exists (DSpace 6.1 and later)

NOTE: solr-reindex-statistics is safe to run on a live site. The script stores incoming usage data in a temporary SOLR core, and then merges that new data into the reindexed data when the reindex process completes.

...

Code Block
# At 12:00AM on January 1, "shard" the DSpace Statistics Solr index.  Ensures each year has its own Solr index - this improves performance.
0 0 1 1 * [dspace]/bin/dspace stats-util -s

...

Info
titleShard Naming

Prior to the release of DSpace 6.1, the shard names created were off by one year in timezones with a positive offset from GMT.

Shards created subsequent to this release may appear to skip by one year.
See
Jira
serverDuraSpace JIRA
serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
keyDS-3437

Technical implementation details

After sharding, the SOLR data cores are located in the [dspace.dir]/solr directory. There is no need to define the location of each individual core in solr.xml because they are automatically retrieved at runtime. This retrieval happens in the static method located in theorg.dspace.statistics.SolrLogger class. These cores are stored in the statisticYearCores list each time a query is made to the solr these cores are added as shards by the addAdditionalSolrYearCores method. The cores share a common configuration copied from your original statistics core. Therefore, no issues should be resulting from subsequent ant updates.

The actual sharding of the of the original solr core into individual cores by year is done in the shardSolrIndex method in the org.dspace.statistics.SolrLogger class. The sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query the main solr data server for all information on each year & download these as csv's. When we have all data for one year we upload it to the newly created core of that year by using the update csvhandler. One all data of one year has been uploaded that data is removed from the main solr (by doing it this way if our solr crashes we do not need to start from scratch).

Info
titleMultiple Shard Fix (DSpace 6.1)

A bug exists in the DSpace 6.0 release that prevents tomcat from starting when multiple shards are present.

To address this issue, the initialization of SOLR shards is deferred until the first SOLR related requests are processed.

See

Jira
serverDuraSpace JIRA
serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
keyDS-3457

...

Testing Solr Shards

Testing Solr Shards