Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Monospaced some literals

...

In the spiders directory itself, you will find a number of files provided by iplists.com.  These files contain network address patterns which have been discovered to identify a number of known indexing services and other spiders.  You can add your own files here if you wish to exclude more addresses that you know of.  You will need to include your files' names in the list configured in config/modules/solr-statistics.cfg.  The iplists.com-*.txt files can be updated using a tool provided by DSpace.  See SOLR Statistics for details.

In the spiders directory you will also find two subdirectories.  agents contains files filled with regular expressions, one per line.  An incoming request's User-Agent header is tested with each expression found in any of these files until an expression matches.  If there is a match, the request is marked as being from a spider, otherwise not.  domains similarly contains files filled with regular expressions which are used to test the domain name from which the request comes.  You may add your own files of regular expressions to either directory if you wish to test requests with patterns of your own devising.