Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: White lines are not OK in robots.txt

...

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter

# This should be the FULL URL to your HTML Sitemap.  
# Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file.
Sitemap: http://[dspace.url]/htmlmap

# If you have configured DSpace (Solr-based) Statistics to be publicly accessible,
# then you likely do not want this content to be indexed
# Disallow: /displaystats

# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse

# You also may wish to disallow access to the following paths, in order
# to stop web spiders from accessing user-based content:
# Disallow: /advanced-search
# Disallow: /contact
# Disallow: /feedback
# Disallow: /forgot
# Disallow: /login
# Disallow: /register
# Disallow: /search

Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are allowed and will not break the user-agent block.

This is OK:

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search

This is not OK, as the two lines at the bottom will be completely ignored.

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search

To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.

Ensure Item Metadata appears in the HTML HEAD

...