Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We are constantly adding new indexing improvements to DSpace.  In order to ensure your site gets all of these improvements, you should strive to keep it up-to-date. For example:

  • As of DSpace 5.0, the DSpace robots.txt file now includes references to Sitemaps by default (see DS-1936), and also blocks known bad bots (see DS-2335).
  • As of DSpace 4.0, DSpace has provided several enhancements, which were requested by the Google Scholar team. These included providing users (and web indexers) a way to browse content by the date it was added to DSpace (see DS-1482), ensuring the "dc.date.issued" field is set more accurately (see DS-1481), and enhancing the logic behind the "citation_pdf_url" HTML <meta> tag (see DS-1483)
  • As of DSpace 1.7, DSpace has improved how its Item-level metadata is made available to Google Scholar. For the 1.7.0 release, the DSpace Developers worked directly with the Google Scholar developers, to ensure DSpace is generating the "citation_*" HTML "<meta>" tags (i.e. Highwire Press tags) that Google Scholar recommends in their Indexing Guidelines.
  • As of DSpace 1.5, DSpace has support for sitemaps (both simple HTML pages of links, as well as the sitemaps.org protocol). It also includes item metadata in the HTML HEAD element of item display pages, ensuring that the metadata can be effectively indexed no matter what changes you might have made to your DSpace's layout or style.
  • As of DSpace 1.4, DSpace has support for the "if-modified-since" HTTP header. This basically means that if an item (or bitstream therein) has not changed since the last time a search engine's crawler indexed it, that item/bitstream does not have to be re-retrieved, sparing your server.

...

  1. Provide a hidden link to the sitemaps in your DSpace's homepage. If you've customized your site's look and feel (as most have), ensure that there is a link to /htmlmap in your DSpace's front or home page. By default, both the JSPUI and XMLUI provide this link in the footer:

    Code Block
    <a href="/htmlmap"></a>
  2. Announce your sitemap in your robots.txt.  Most major search engines will also automatically discover your sitemap if you announce it in your robots.txt file. By default, both the JSPUI and XMLUI provide these references in their robots.txt file.  For example:

    Code Block
    # The FULL URL to the DSpace sitemaps
    # XML sitemap is listed first as it is preferred by most search engines
    # Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file.
    Sitemap: [dspace.url]/sitemap
    Sitemap: [dspace.url]/htmlmap
    1. These "Sitemap:" lines can be placed anywhere in your robots.txt file. You can also specify multiple "Sitemap:" lines, so that search engines can locate both formats. For more information, see: http://www.sitemaps.org/protocol.html#informing
    2. Be sure to include the FULL URL in the "Sitemap:" line. Relative paths are not supported.

...