Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Whitespace and other minor mechanical fixes

...

Technical implementation details

After sharding, the SOLR Solr data cores are located in the [dspace.dir]/solr directory. There is no need to define the location of each individual core in solr.xml because they are automatically retrieved at runtime. This retrieval happens in the static method located in the org.dspace.statistics.SolrLogger class. These cores are stored in the statisticYearCores list each .  Each time a query is made to the solr Solr, these cores are added as shards by the addAdditionalSolrYearCores method. The cores share a common configuration copied from your original statistics core. Therefore, no issues should be resulting from subsequent ant updatesupdates.

The actual sharding of the of the original solr Solr core into individual cores by year is done in the shardSolrIndex method in the org.dspace.statistics.SolrLogger class. The sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query the main solr Solr data server for all information on each year & download these as csv'sCSVs. When we have all data for one year, we upload it to the newly created core of that year by using the update csv handler. One Once all data of one year has have been uploaded that , those data is are removed from the main solr Solr (by doing it this way if our solr Solr crashes we do not need to start from scratch).

...

Testing Solr Shards

Testing Solr Shards