Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Watch the DSpace Discovery introduction video

Info

Since DSpace 46.0, Discovery is the default the only out-of-the-box Search and Browse infrastructure for both XMLUI and JSPUIprovided in DSpace.

What is a Sidebar Facet

From the user perspective, faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets.

...

 This is a classic "tag cloud" facet in a DSpace repository.

Discovery Changelist

DSpace 6.0

The legacy search engine (based on Apache Lucene) and legacy Browse system (based on database tables) have been removed from DSpace 6.0 or above. Instead, DSpace now only uses Discovery for all Search/Browse capabilities.

In addition, to support the new Configuration options, all of the Discovery configurations in discovery.cfg have been prefixed with "discovery." (see configuration below).

DSpace 5.0

The new JSPUI-only tag cloud facet feature is disabled by default. In order to enable it, you will need to set up the corresponding processor that the PluginManager will load to actually perform the tag cloud query on the relevant pages. This is configured in the dspace.cfg configuration file using the following properties:

...

  • Browse interfaces now also use Discovery index (rather than the legacy, now retired, Lucene index)
  • "Did you means" spell check aid for search

...

  • Sidebar browse facets that can be configured to use contents from any metadata field
    • Dynamically generated timespans for dates
  • Customizable "recent submissions" view on the repository homepage, collection and community pages
  • Hit highlighting & search snippets

Enabling Discovery

Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.

Removing Legacy Browse Tables (bi_*) from your Database

If you have upgraded from an older version of DSpace, your database may still include outdated "bi_*" tables (where "bi" = "browse index").  When Discovery is enabled, these tables are no longer necessary, as Discovery takes over this browse index function.

To clean up all these old "bi_*" tables, simply run:

Code Block
[dspace]/bin/dspace index-db-browse -f -d

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace-install-dir]/config/modules directory and contains following properties:

 

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace]/config/modules directory and contains following properties. Any of these properties may be overridden in your local.cfg (see Configuration Reference):

Property:

discovery.search.server

Example Value:

discovery.search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on

Property:

search.server

Example Value:

search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index.

If you are uncertain whether this property is set correctly, you can use a commandline tool like "wget" to perform a query against the Solr index (and ensure Solr responds). For example, the below query searches the Solr index for "test" and returns the response on standard out:

wget -O - http://localhost:8080/solr/search/select?q=test

Property:

discovery.index.

Property:

index.authority.ignore[.field]

Example Value:

discovery.index.authority.ignore=true

discovery.index.authority.ignore.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value.

Property:

discovery.index.authority.ignore-prefered[.field]

Example Value:

discovery.index.authority.ignore-prefered=true

discovery.index.authority.ignore-prefered.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for the prefered preferred label. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information (i.e. the prefered preferred form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance.

Property:

discovery.index.authority.ignore-variants[.field]

Example   Value:

discovery.index.authority.ignore-variants=true

discovery.index.authority.ignore-variants.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata  doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If authority is a remote service, disabling this feature can greatly improve performance.

...

The discovery.xml file is located in the [dspace-install-dir]/config/spring/api directory.

...

Customizing hit highlighting & search snippets

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

Warning

This paragraph section only applies to XMLUI. JSPUI does not currently support "highlighting & search snippets".


Info

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

Warning

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

Below is an example configuration of hit highlighting.

titleDisabling hit highlighting / search snippets

You can disable hit highlighting / search snippets by commenting out the entire <property name="hitHighlightingConfiguration"> Configuration in the [dspace]/config/spring/api/discovery.xml configuration file.

PLEASE BE AWARE there are two sections where this <property> definition exists. You should comment out both. One is under the <bean id="defaultConfiguration"> and one is under the <bean id="homepageConfiguration">

Alternatively, you may also choose to tweak which fields are shown in hit highlighting, or modify the number of matching words shown (snippets) and/or number of characters shown around the matching word (maxSize).

For this change to take effect in the User Interface, you will need to restart Tomcat.


Note

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

Below is an example configuration of hit highlighting.

Code Block
languagehtml/
Code Block
languagehtml/xml
<property name="hitHighlightingConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
        <property name="metadataFields">
            <list>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.title"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.contributor.author"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.subject"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.description.abstract"/>
                    <property name="maxSize" value="250"/>
        <!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="snippetsmaxSize" value="2250"/>
                </bean>
    <!-- Max number of snippets (matching words) to     <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"show -->
                    <property name="fieldsnippets" value="fulltext2"/>
                    <property name="maxSize" value="250"/></bean>
                    <property name="snippets" value="2"/>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
					<!-- Displays snippets from indexed full text of document (for supported formats) -->
                     </bean><property name="field" value="fulltext"/>
            </list>
        </property>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are:

!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="maxSize" value="250"/>
					<!-- Max number of snippets (matching words) to show -->
                    <property name="snippets" value="2"/>
                </bean>
            </list>
        </property>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are:

  • field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be highlighted).
  • maxSize (optional): Limit the number of characters
  • field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be highlighted).
  • maxSize (optional): Limit the number of characters displayed to only the relevant part (use metadata field as search snippet).
  • snippets (optional): The maximum number of snippets that can be found in one metadata field.

...

Code Block
languagexml
<bean id="tagCloudConfiguration" class="org.dspace.discovery.configuration.TagCloudConfiguration">
              <!-- Should display the score of each tag next to it? Default: false -->
              <property name="displayScore" value="true"/>
              <!-- Should display the tag as center aligned in the page or left aligned? Possible values: true | false. Default: true  -->
              <property name="shouldCenter" value="true"/>
              <!-- How many tags will be shown. Value -1 means all of them. Default: -1 -->
              <property name="totalTags" value="-1"/>             
              <!-- The letter case of the tags. 
                      Possible values: Case.LOWER | Case.UPPER | Case.CAPITALIZATION | Case.PRESERVE_CASE | Case.CASE_SENSITIVE
                      Default: Case.PRESERVE_CASE -->
              <property name="cloudCase" value="Case.PRESERVE_CASE"/>
              <!-- If the 3 CSS classes of the tag cloud should be independent of score (random=yes) or based on the score. Possible values: true | false . Default: true-->
              <property name="randomColors" value="true"/>
              <!-- The font size (in em) for the tag with the lowest score. Possible values: any decimal. Default: 1.1 -->
              <property name="fontFrom" value="1.1"/>
              <!-- The font size (in em) for the tag with the lowest score. Possible values: any decimal. Default: 3.2 -->
              <property name="fontTo" value="3.2"/>
              <!-- The score that tags with lower than that will not appear in the rag cloud. Possible values: any integer from 1 to infinity. Default: 0 -->
              <property name="cuttingLevel" value="0"/>
              <!-- The distance (in px) between the tags. Default: 5 -->
              <property name="marginRight" value="5"/>
              <!-- The ordering of the tags (based either on the name or the score of the tag)
                      Possible values: Tag.NameComparatorAsc | Tag.NameComparatorDesc | Tag.ScoreComparatorAsc | Tag.ScoreComparatorDesc
                      Default: Tag.ScoreComparatorDesc
NameComparatorAsc  -->
              <property name="ordering" value="Tag.NameComparatorAsc"/>    
   Default: Tag.NameComparatorAsc  -->
              <property name="ordering" value="Tag.NameComparatorAsc"/>    
    </bean>

When tagCloud is rendered there are some CSS classes that you can change in order to change the appearance of the tag cloud.

...

Class

...

Note

...

 </bean>

When tagCloud is rendered there are some CSS classes that you can change in order to change the appearance of the tag cloud.

Class
Note
tagcloudGeneral class for the whole tagcloud
tagcloud_1Specific tag class for tag of type 1 (based on score)
tagcloud_2Specific tag class for tag of type 2 (based on score)
tagcloud_3Specific tag class for tag of type 3 (based on score)

Disabling the "Has file(s)" facet

Since DSpace 6, a new "Has file(s)" facet has been enabled by default. This facet shows whether items have or do not have any bitstreams in the "ORIGINAL" bundle.
Should you want to turn this off, you can edit [dspace]/config/spring/api/discovery.xml to remove the following line from the defaultConfiguration and homepageConfiguration beans (in the sidebarFacets property):

 

Code Block
languagexml
<ref bean="searchFilterContentInOriginalBundle"/>

Then restart your servlet container.

...

Discovery Solr Index Maintenance

Command used:

[dspace]/bin/dspace index-discovery [-cbhf[r <item handle>]]

Java class:

org.dspace.discovery.IndexClient

Arguments (short and long forms):

Description

 

called without any options, will update/clean an existing index

-b

(re)build index, wiping out current one if it exists

-c

clean existing index removing any documents that no longer exist in the db

-f

if updating existing index, force each handle to be reindexed even if uptodate

-h

print this help message

-i <object handle>Reindex an individual object (and any child objects).  When run on an Item, it just reindexes that single Item. When run on a Collection, it reindexes the Collection itself and all Items in that Collection. When run on a Community, it reindexes the Community itself and all sub-Communities, contained Collections and contained Items.

-o

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

-sRebuild the spellchecker, can be combined with -b and -f.

It is recommended to run maintenance on the Discovery Solr index occasionally (from crontab or your system's scheduler), to prevent your servlet container from running out of memory:

...

Internationalization

Discovery currently has its own messages.xml file, located at dspace/modules/-xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml. Should you want to   To add your own labels for new fields and facets , you in a Maven overlay, copy this file to dspace/modules/xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml and modify this file. Alternatively, you may add them either to this file or to the main messages.xml file. Same goes for translations - it's encouraged to submit a single messages_XX.xml file including messages from all the separate messages.xml files in DSpace.

Advanced search related keys (change "author" to desired field)

Filter namexmlui.ArtifactBrowser.SimpleSearch.filter.author
Facet headingxmlui.ArtifactBrowser.AdvancedSearch.type_author
"Filter by" page heading xmlui.Discovery.AbstractSearch.type_author