Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="2a095f4e-4a22-42a1-84ab-6d9edfafe4b5"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-converter]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.ClassicDSpaceLogConverter

Arguments short and long forms):

Description

-i or -in

Input file

-o or -out

Output file

-m or -multiple

Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)

-n or -newformat

If the log files have been created with DSpace 1.6

-v or -verbose

Display verbose output (helpful for debugging)

-h or -help

Help

The command loads the intermediate log files that have been created by the aforementioned script into SOLR.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="74beb94f-c731-4ee3-850f-69e4c2c1d985"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-importer

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsImporter

Arguments (short and long forms):

Description

-i or --

input file

-m or --

Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported

-s or --

To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)

-v or --

Display verbose ouput (helpful for debugging)

-l or --

For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.

-h or --

Help

...

Filtering and Pruning Spiders

...

Command used:

[dspace]/bin/dspace stats-util]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

-u or -update-spider-files

Update Spider IP Files from internet into /dspace/config/spiders. Downloads Spider files identified in dspace.cfg under property solr.spiderips.urls. See DSpace SOLR Statistics Configuration

-f or -delete-spiders-by-flag

Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true

-i or -delete-spiders-by-ip

Delete Spiders in Solr By IP Address. Will prune out all records that have IP's that match spider IPs.

-m or -mark-spiders

Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files

-h or -help

Calls up this brief help table at command line.

...

If they want to keep the spiders out of the solr repository, they can run just use the "-i" option and they will be removed immediately.

Wiki MarkupThere are guards in place to control what can be defined as an IP range for a bot, in {{\[dspace\]/config/spiders}}, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet \ [123.123.123.0 - 123.123.123.255\]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry.

Routine SOLR Index Maintenance

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a47605de-3742-41b8-ad4b-fa375330336b"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

-o or -optimize

Run maintenance on the SOLR index. Recommended to run daily, to prevent your servlet container from running out of memory

...