Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Keep your DSpace up to date. We are constantly adding new indexing improvements in new releases
  2. Ensure your DSpace is visible to search engines.
  3. Enable the sitemaps feature – this does not require e.g. registering with Google Webmaster tools.
  4. Ensure your robots.txt allows access to item "splash" pages and full text.
  5. Ensure item metadata appears in HTML headers correctly.
  6. Avoid redirecting file downloads to Item landing pages
  7. As an aside, it's worth noting that OAI-PMH is generally not useful to search engines.  OAI-PMH has its own uses, but do not expect search engines to use it.

...

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter

# This should be the FULL URL to your HTML Sitemap.  
# Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file.
Sitemap: http://[dspace.url]/htmlmap

# If you have configured DSpace (Solr-based) Statistics to be publicly accessible,
# then you likely do not want this content to be indexed
# Disallow: /displaystats

# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse

# You also may wish to disallow access to the following paths, in order
# to stop web spiders from accessing user-based content:
# Disallow: /advanced-search
# Disallow: /contact
# Disallow: /feedback
# Disallow: /forgot
# Disallow: /login
# Disallow: /register
# Disallow: /search

Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are allowed and will not break the user-agent block.

This is OK:

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search

This is not OK, as the two lines at the bottom will be completely ignored.

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
 
Disallow: /displaystats
Disallow: /advanced-search

To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.

Ensure Item Metadata appears in the HTML HEAD

...

Much more information is available in the Configuration section on Google Scholar Metadata Mappings.

Avoid redirecting file downloads to Item landing pages

Make sure that you never redirect "direct file downloads" (i.e. users who directly jump to downloading a file, often from a search engine) to the associated Item's splash/landing page.  In the past, some DSpace sites have added these custom URL redirects in order to facilitate capturing statistics via Google Analytics or similar.

While these URL redirects may seem harmless, they may be flagged as cloaking or spam by Google, Google Scholar and other major search engines. This may hurt your site's search engine ranking or even cause your entire site to be flagged for removal from the search engine.

If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created these redirects to facilitate capturing download statistics in Google Analytics, you should consider upgrading to DSpace 5.0 or above, which is able to automatically record bitstream downloads in Google Analytics (see DS-2088) without the need for any URL redirects.

In general, OAI-PMH is not useful to Search Engines

...