All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
...
...
Code Block |
---|
User-agent: * # Disable access to Discovery search and filters Disallow: /discover Disallow: /search-filter # This should be the FULL URL to your HTML Sitemap. # Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file. Sitemap: http://[dspace.url]/htmlmap # If you have configured DSpace (Solr-based) Statistics to be publicly accessible, # then you likely do not want this content to be indexed # Disallow: /displaystats # Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used # and you have verified that your site is being indexed correctly. # Disallow: /browse # You also may wish to disallow access to the following paths, in order # to stop web spiders from accessing user-based content: # Disallow: /advanced-search # Disallow: /contact # Disallow: /feedback # Disallow: /forgot # Disallow: /login # Disallow: /register # Disallow: /search |
Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are allowed and will not break the user-agent block.
This is OK:
Code Block |
---|
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search |
This is not OK, as the two lines at the bottom will be completely ignored.
Code Block |
---|
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search |
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
...
Much more information is available in the Configuration section on Google Scholar Metadata Mappings.
Make sure that you never redirect "direct file downloads" (i.e. users who directly jump to downloading a file, often from a search engine) to the associated Item's splash/landing page. In the past, some DSpace sites have added these custom URL redirects in order to facilitate capturing statistics via Google Analytics or similar.
While these URL redirects may seem harmless, they may be flagged as cloaking or spam by Google, Google Scholar and other major search engines. This may hurt your site's search engine ranking or even cause your entire site to be flagged for removal from the search engine.
If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created these redirects to facilitate capturing download statistics in Google Analytics, you should consider upgrading to DSpace 5.0 or above, which is able to automatically record bitstream downloads in Google Analytics (see DS-2088) without the need for any URL redirects.
...