Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Normalize Solr capitalization, tidy up run-ons and such

...

With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR Solr for statistics makes it possible to have a database of statistics. This With this in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR Solr use. The user will need to perform this only once.

The Log Converter command converts log files from dspace.log into an intermediate format that can be inserted into SOLRSolr.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8e8743f0429656f9-c7422dff-4e1548be-bef693b5-925d5e510068c202fa38f8d0"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-converter

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.ClassicDSpaceLogConverter

Arguments short and long forms):

Description

-i or --in

Input file. Read from standard input if omitted or "-".

-o or --out

Output file. Written to standard output if omitted or "-".

-m or --multiple

Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)

-n or --newformat

If the log files have been created with DSpace 1.6

-v or --verbose

Display verbose output (helpful for debugging)

-h or --help

Help

The Log Importer command loads into SOLR Solr the intermediate log files that have been created by the Log Converter.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c1ecc1fac0d4f818-63a05305-45724674-b0a784ef-3e76a83469ce6ad2c46cedac"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-importer

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsImporter

Arguments (short and long forms):

Description

-i or --in

input file. Read from standard input if omitted or "-".

-m or --multiple

Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported

-s or --skipdns

To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)

-v or --verbose

Display verbose ouput (helpful for debugging)

-l or --local

For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.

-h or --help

Help

Although the DSpace Log Convertor Converter applies basic spider filtering (googlebot, yahoo slurp, msnbotGooglebot, Yahoo!, Slurp, MSNbot), it is far from complete. Please refer to Filtering and Pruning Spiders for spider removal operations , after converting your old logs.

...

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="5864e5c245351beb-1fa4cd72-4e6f4022-8e17b675-caa086d30b95c527283e2514"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

-u or -update-spider-files

Update Spider IP Files from internet into /dspace/config/spiders. Downloads Spider files identified in dspace.cfg under property solr.spiderips.urls. See DSpace SOLR Statistics Configuration

-f or -delete-spiders-by-flag

Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true

-i or -delete-spiders-by-ip

Delete Spiders in Solr By IP Address. Will prune out all records that have IP's IPs that match spider IPs.

-m or -mark-spiders

Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files

-h or -help

Calls up this brief help table at command line.

...

The usage of these options is open for the user to choose, . If they want you wish to keep spider entires entries in their your repository, they can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg.

If they you want to keep the spiders out of the solr Solr repository, they can run just use the "-i" option and they will be removed immediately.

Wiki Markup
There are guards in place to control what can be defined as an IP range for a bot,.  inIn {{\[dspace\]/config/spiders}}, spider IP address ranges have to be at least 3 subnet sections in length 123(12.12334.12356) and IP Ranges can only be on the smallest subnet \[123.123.123.0 - 123.123.123.255\]. If not, loading that row will cause exceptions in the dspaceDSpace logs and exclude that IP entry.

Routine

...

Solr Index Maintenance

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3cce9599d49b3a24-06cb9078-4b154f4e-89a3868b-edc65c993bef018a66657383"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

-o or -optimize

Run maintenance on the SOLR Solr index. Recommended to run daily, to prevent your servlet container from running out of memory

Notes:

The usage of this this option is strongly recommended, you . You should run this script command daily (from crontab or your system's scheduler), to prevent your servlet container from running out of memory.