Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Watch the DSpace Discovery introduction video

Info

Since DSpace 46.0, Discovery is the default the only out-of-the-box Search and Browse infrastructure for both XMLUI and JSPUIprovided in DSpace.

What is a Sidebar Facet

From the user perspective, faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets.

...

 This is a classic "tag cloud" facet in a DSpace repository.

Discovery Changelist

DSpace 6.0

The legacy search engine (based on Apache Lucene) and legacy Browse system (based on database tables) have been removed from DSpace 6.0 or above. Instead, DSpace now only uses Discovery for all Search/Browse capabilities.

In addition, to support the new Configuration options, all of the Discovery configurations in discovery.cfg have been prefixed with "discovery." (see configuration below).

DSpace 5.0

The new JSPUI-only tag cloud facet feature is disabled by default. In order to enable it, you will need to set up the corresponding processor that the PluginManager will load to actually perform the tag cloud query on the relevant pages. This is configured in the dspace.cfg configuration file using the following properties:

...

  • Browse interfaces now also use Discovery index (rather than the legacy, now retired, Lucene index)
  • "Did you means" spell check aid for search

...

  • Sidebar browse facets that can be configured to use contents from any metadata field
    • Dynamically generated timespans for dates
  • Customizable "recent submissions" view on the repository homepage, collection and community pages
  • Hit highlighting & search snippets

Enabling Discovery

Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.

Removing Legacy Browse Tables (bi_*) from your Database

If you have upgraded from an older version of DSpace, your database may still include outdated "bi_*" tables (where "bi" = "browse index").  When Discovery is enabled, these tables are no longer necessary, as Discovery takes over this browse index function.

To clean up all these old "bi_*" tables, simply run:

Code Block
[dspace]/bin/dspace index-db-browse -f -d

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace-install-dir]/config/modules directory and contains following properties:

 

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace]/config/modules directory and contains following properties. Any of these properties may be overridden in your local.cfg (see Configuration Reference):

Property:

discovery.search.server

Example Value:

discovery.search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on

Property:

search.server

Example Value:

search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index.

If you are uncertain whether this property is set correctly, you can use a commandline tool like "wget" to perform a query against the Solr index (and ensure Solr responds). For example, the below query searches the Solr index for "test" and returns the response on standard out:

wget -O - http://localhost:8080/solr/search/select?q=test

Property:

discovery.index.

Property:

index.authority.ignore[.field]

Example Value:

discovery.index.authority.ignore=true

discovery.index.authority.ignore.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value.

Property:

discovery.index.authority.ignore-prefered[.field]

Example Value:

discovery.index.authority.ignore-prefered=true

discovery.index.authority.ignore-prefered.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for the prefered preferred label. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information (i.e. the prefered preferred form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance.

Property:

discovery.index.authority.ignore-variants[.field]

Example   Value:

discovery.index.authority.ignore-variants=true

discovery.index.authority.ignore-variants.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata  doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If authority is a remote service, disabling this feature can greatly improve performance.

...

The discovery.xml file is located in the [dspace-install-dir]/config/spring/api directory.

...

Customizing hit highlighting & search snippets

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

Warning

This paragraph section only applies to XMLUI. JSPUI does not currently support "highlighting & search snippets".


Info

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

Warning

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

Below is an example configuration of hit highlighting.

titleDisabling hit highlighting / search snippets

You can disable hit highlighting / search snippets by commenting out the entire <property name="hitHighlightingConfiguration"> Configuration in the [dspace]/config/spring/api/discovery.xml configuration file.

PLEASE BE AWARE there are two sections where this <property> definition exists. You should comment out both. One is under the <bean id="defaultConfiguration"> and one is under the <bean id="homepageConfiguration">

Alternatively, you may also choose to tweak which fields are shown in hit highlighting, or modify the number of matching words shown (snippets) and/or number of characters shown around the matching word (maxSize).

For this change to take effect in the User Interface, you will need to restart Tomcat.


Note

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

Below is an example configuration of hit highlighting.

Code Block
languagehtml/
Code Block
languagehtml/xml
<property name="hitHighlightingConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
        <property name="metadataFields">
            <list>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.title"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.contributor.author"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.subject"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.description.abstract"/>
                    <property name="maxSize" value="250"/>
        <!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="snippetsmaxSize" value="2250"/>
                </bean>
    <!-- Max number of snippets (matching words) to     <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"show -->
                    <property name="fieldsnippets" value="fulltext2"/>
                    <property name="maxSize" value="250"/></bean>
                    <property name="snippets" value="2"/>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
					<!-- Displays snippets from indexed full text of document (for supported formats) -->
                     </bean><property name="field" value="fulltext"/>
            </list>
        </property>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are:

  • field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be highlighted).
  • maxSize (optional): Limit the number of characters displayed to only the relevant part (use metadata field as search snippet).
  • snippets (optional): The maximum number of snippets that can be found in one metadata field.
Hit highlighting technical details

The org.dspace.discovery.DiscoveryQuery object has a setter & getter for the hit highlighting configuration set in Discovery configuration. If this configuration is given the resolveToSolrQuery method located in the org.dspace.discovery.SolrServiceImpl class will use the standard Solr highlighting feature (http://wiki.apache.org/solr/HighlightingParameters). The org.dspace.discovery.DiscoverResult class has a method to set the highlighted fields for each object & field.

The rendering of search results is no longer handled by the METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains hit higlighting the Java code will split up the string & add DRI highlights to the list. The XSL for the themes also contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For themes using the old themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".

"More like this" configuration

Warning

This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this" feature.

The "more like this"-configuration element contains all the settings for displaying related items on an item display page.
Below is an example of the "more like this" configuration.

!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="maxSize" value="250"/>
					<!-- Max number of snippets (matching words) to show -->
                    <property name="snippets" value="2"/>
                </bean>
            </list>
        </property>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are:

  • field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be highlighted).
  • maxSize (optional): Limit the number of characters displayed to only the relevant part (use metadata field as search snippet).
  • snippets (optional): The maximum number of snippets that can be found in one metadata field.
Hit highlighting technical details

The org.dspace.discovery.DiscoveryQuery object has a setter & getter for the hit highlighting configuration set in Discovery configuration. If this configuration is given the resolveToSolrQuery method located in the org.dspace.discovery.SolrServiceImpl class will use the standard Solr highlighting feature (http://wiki.apache.org/solr/HighlightingParameters). The org.dspace.discovery.DiscoverResult class has a method to set the highlighted fields for each object & field.

The rendering of search results is no longer handled by the METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains hit higlighting the Java code will split up the string & add DRI highlights to the list. The XSL for the themes also contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For themes using the old themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".

"More like this" configuration

Warning

This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this" feature.

The "more like this"-configuration element contains all the settings for displaying related items on an item display page.
Below is an example of the "more like this" configuration.

Code Block
languagehtml/xml
<property name="moreLikeThisConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
        <property name="similarityMetadataFields">
            
Code Block
languagehtml/xml
<property name="moreLikeThisConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
        <property name="similarityMetadataFields">
            <list>
                <value>dc.title</value>
                <value>dc.contributor.author</value>
                <value>dc.creator</value>
                <value>dc.subject</value>
            </list>
        </property>
        <!--The minimum number of matching terms across the metadata fields above before an item is found as related -->
        <property name="minTermFrequency" value="5"/>
        <!--The maximum number of related items displayed-->
        <property name="max" value="3"/>
        <!--The minimum word length below which words will be ignored-->
        <property name="minWordLength" value="5"/>
    </bean>
</property>

...

Class
Note
tagcloudGeneral class for the whole tagcloud
tagcloud_1Specific tag class for tag of type 1 (baed based on score)
tagcloud_2Specific tag class for tag of type 2 (baed based on score)
tagcloud_3Specific tag class for tag of type 3 (baed based on score)

Discovery Solr Index Maintenance

Disabling the "Has file(s)" facet

Since DSpace 6, a new "Has file(s)" facet has been enabled by default. This facet shows whether items have or do not have any bitstreams in the "ORIGINAL" bundle.
Should you want to turn this off, you can edit [dspace]/config/spring/api/discovery.xml to remove the following line from the defaultConfiguration and homepageConfiguration beans (in the sidebarFacets property):

 

Code Block
languagexml
<ref bean="searchFilterContentInOriginalBundle"/>

Then restart your servlet container.

Discovery Solr Index Maintenance

o

Command used:

[dspace]/

Command used:

[dspace]/bin/dspace index-discovery [-cbhf[r <item handle>]]

Java class:

org.dspace.discovery.IndexClient

Arguments (short and long forms):

Description

 

called without any options, will update/clean an existing index

-b

(re)build index, wiping out current one if it exists

-c

clean existing index removing any documents that no longer exist in the db

-f

if updating existing index, force each handle to be reindexed even if uptodate

-h

print this help message

-

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

Routine Discovery Solr Index Maintenance

i <object handle>Reindex an individual object (and any child objects).  When run on an Item, it just reindexes that single Item. When run on a Collection, it reindexes the Collection itself and all Items in that Collection. When run on a Community, it reindexes the Community itself and all sub-Communities, contained Collections and contained Items.

-o

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

-sRebuild the spellchecker, can be combined with -b and -f.

It is It is strongly recommended to run maintenance on the Discovery Solr index daily occasionally (from crontab or your system's scheduler), to prevent your servlet container from running out of memory:

[dspace]/bin/dspace index-discovery -o

Advanced Solr Configuration

(Since Solr 4, the underlying optimize operation has been discouraged as mostly unnecessary and renamed. See https://issues.apache.org/jira/browse/SOLR-3141).

Advanced Solr Configuration

Discovery is built as an application layer on top of the Solr open source enterprise search server. Therefore, Solr configuration can be applied to the Discovery is built as an application layer on top of the Solr open source enterprise search server. Therefore, Solr configuration can be applied to the Solr cores that are shipped with DSpace.
The DSpace Solr instance itself now currently runs two cores. One for collection DSpace Solr based "statistics", the other for Discovery Solr based "search".several cores (which means indexes in Solr parlance). The "statistics" core is for collection of DSpace usage events for statistical purposes (if you have been collecting statistics for multiple years, you may have chosen to use sharding and you will see one core per each year collected). The "search" core is used by Discovery for for search and  faceting, for displaying the collection/community hierarchy and item counts. The "authority" core is used by SolrAuthority to store information about authors, including their data imported from the ORCID registry.

Code Block
solr
├── solr.xml
├── search
│   └── conf
│       ├── admin-extra.html
│       ├── elevate.xml
│       ├── protwords.txt
│       ├── schema.xml
│       ├── scripts.conf
│       ├── solrconfig.xml
│       ├── spellings.txt
│       ├── stopwords.txt
│       ├── synonyms.txt
│       └── xslt
│           ├── DRI.xsl
│           ├── example.xsl
│           ├── example_atom.xsl
│           ├── example_rss.xsl
│           └── luke.xsl
├── ...
└── statistics
    └── conf
        ├── admin-extra.html
     
Code Block
solr
├── search
│   ├── conf
│   │   ├── admin-extra.html
│   │   ├── elevate.xml
        ├── protwords.txt
        ├── schema.xml
        ├── scripts.conf
        ├── solrconfig.xml
        ├── spellings.txt
        ├── stopwords.txt
        ├── synonyms.txt
        └── xslt
         ├── DRI.xsl
│  ├── example.xsl
            ├── example_atom.xsl
            ├── example_rss.xsl
            └── luke.xsl
│   └── conf2
├── solr.xml
└── statistics
    └── conf
        ├── admin-extra.html
        ├── elevate.xml
        ├── protwords.txt
        ├── schema.xml
        ├── scripts.conf
        ├── solrconfig.xml
        ├── spellings.txt
        ├── stopwords.txt
        ├── synonyms.txt
        └── xslt
            ├── example.xsl
            ├── example_atom.xsl
            ├── example_rss.xsl
            └── luke.xsl

Internationalization

Discovery has its own messages.xml file, located at dspace-xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml.  To add your own labels for new fields and facets in a Maven overlay, copy this file to dspace/modules/xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml and modify this file. Alternatively, you may add them to the main messages.xml file. Same goes for translations - it's encouraged to submit a single messages_XX.xml file including messages from all the separate messages.xml files in DSpace.

Advanced search related keys (change "author" to desired field)

Filter namexmlui.ArtifactBrowser.SimpleSearch.filter.author
Facet headingxmlui.ArtifactBrowser.AdvancedSearch.type_author
"Filter by" page heading xmlui.Discovery.AbstractSearch.type_author