All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
...
...
We are constantly adding new indexing improvements to DSpace. In order to ensure your site gets all of these improvements, you should strive to keep it up-to-date. For example:
...
Provide a hidden link to the sitemaps in your DSpace's homepage. If you've customized your site's look and feel (as most have), ensure that there is a link to /htmlmap
in your DSpace's front or home page.By default, both the JSPUI and XMLUI provide this link in the footer:
Code Block |
---|
<a href="/htmlmap"></a> |
Announce your sitemap in your robots.txt. Most major search engines will also automatically discover your sitemap if you announce it in your robots.txt file. For example:
Code Block |
---|
Sitemap: http://my.dspace.url/sitemap
Sitemap: http://my.dspace.url/htmlmap |
Search engines will now look at /htmlmapyour XML and HTML sitemaps, which serves one or more serve pre-generated (and thus served with minimal impact on your hardware) XML or HTML files linking directly to items, collections and communities in your DSpace instance. Crawlers will not have to work their way through any browse screens, which are intended more for human consumption, and more expensive for the server.
...
Code Block |
---|
User-agent: * # Disable access to Discovery search and filters Disallow: /discover Disallow: /search-filter # This should be the FULL URL to your HTML Sitemap. # Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file. Sitemap: http://[dspace.url]/htmlmap # If you have configured DSpace (Solr-based) Statistics to be publicly accessible, # then you likely do not want this content to be indexed # Disallow: /displaystats # Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used # and you have verified that your site is being indexed correctly. # Disallow: /browse # You also may wish to disallow access to the following paths, in order # to stop web spiders from accessing user-based content: # Disallow: /advanced-search # Disallow: /contact # Disallow: /feedback # Disallow: /forgot # Disallow: /login # Disallow: /register # Disallow: /search |
Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are allowed and will not break the user-agent block.
This is OK:
Code Block |
---|
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search |
This is not OK, as the two lines at the bottom will be completely ignored.
Code Block |
---|
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search |
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
...
Much more information is available in the Configuration section on Google Scholar Metadata Mappings.
Make sure that you never redirect "direct file downloads" (i.e. users who directly jump to downloading a file, often from a search engine) to the associated Item's splash/landing page. In the past, some DSpace sites have added these custom URL redirects in order to facilitate capturing statistics via Google Analytics or similar.
While these URL redirects may seem harmless, they may be flagged as cloaking or spam by Google, Google Scholar and other major search engines. This may hurt your site's search engine ranking or even cause your entire site to be flagged for removal from the search engine.
If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created these redirects to facilitate capturing download statistics in Google Analytics, you should consider upgrading to DSpace 5.0 or above, which is able to automatically record bitstream downloads in Google Analytics (see DS-2088) without the need for any URL redirects.
...