Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Watch the DSpace Discovery introduction video

Info

Since DSpace 46.0, Discovery is the default the only out-of-the-box Search and Browse infrastructure for both XMLUI and JSPUIprovided in DSpace.

What is a Sidebar Facet

From the user perspective, faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets.

...

 This is a classic "tag cloud" facet in a DSpace repository.

Discovery Changelist

DSpace 6.0

The legacy search engine (based on Apache Lucene) and legacy Browse system (based on database tables) have been removed from DSpace 6.0 or above. Instead, DSpace now only uses Discovery for all Search/Browse capabilities.

In addition, to support the new Configuration options, all of the Discovery configurations in discovery.cfg have been prefixed with "discovery." (see configuration below).

DSpace 5.0

The new JSPUI-only tag cloud facet feature is disabled by default. In order to enable it, you will need to set up the corresponding processor that the PluginManager will load to actually perform the tag cloud query on the relevant pages. This is configured in the dspace.cfg configuration file using the following properties:

...

  • Sidebar browse facets that can be configured to use contents from any metadata field
    • Dynamically generated timespans for dates
  • Customizable "recent submissions" view on the repository homepage, collection and community pages
  • Hit highlighting & search snippets

Enabling Discovery

Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.

Removing Legacy Browse Tables (bi_*) from your Database

If you have upgraded from an older version of DSpace, your database may still include outdated "bi_*" tables (where "bi" = "browse index").  When Discovery is enabled, these tables are no longer necessary, as Discovery takes over this browse index function.

To clean up all these old "bi_*" tables, simply run:

Code Block
[dspace]/bin/dspace index-db-browse -f -d

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace-install-dir]/config/modules directory and contains following properties:

 

Configuration files

The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace-install-dir]/config/modules directory.
  • User Interface Configuration: The discovery.xml file is located in [dspace-install-dir]/config/spring/api/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace]/config/modules directory and contains following properties. Any of these properties may be overridden in your local.cfg (see Configuration Reference):

Property:

discovery.search.server

Example Value:

discovery.search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on a Solr index for

Property:

search.server

Example Value:

search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index.

If you are uncertain whether this property is set correctly, you can use a commandline tool like "wget" to perform a query against the Solr index (and ensure Solr responds). For example, the below query searches the Solr index for "test" and returns the response on standard out:

wget -O - http://localhost:8080/solr/search/select?q=test

Property:

discovery.index.authority

Property:

index.authority.ignore[.field]

Example Value:

discovery.index.authority.ignore=true

discovery.index.authority.ignore.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value.

Property:

discovery.index.authority.ignore-prefered[.field]

Example Value:

discovery.index.authority.ignore-prefered=true

discovery.index.authority.ignore-prefered.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for the prefered preferred label. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information (i.e. the prefered preferred form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance.

Property:

discovery.index.authority.ignore-variants[.field]

Example   Value:

discovery.index.authority.ignore-variants=true

discovery.index.authority.ignore-variants.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata  doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If authority is a remote service, disabling this feature can greatly improve performance.

...

The discovery.xml file is located in the [dspace-install-dir]/config/spring/api directory.

...

Code Block
langxml
<property name="recentSubmissionConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryRecentSubmissionsConfiguration">
        <property name="metadataSortField" value="dc.date.accessioned"/>
        <property name="type" value="date"/>
           <property name="max" value="5"/>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses below.

  • metadataSortField (mandatory): The metadata field to sort on to retrieve the recent submissions
  • max (mandatory): The maximum number of results to be displayed as recent submissions
  • type (optional): the type of the search filter. It can either be date or text, if none is defined text will be used.

Customizing hit highlighting & search snippets

Warning

This paragraph only applies to XMLUI. JSPUI does not currently support "highlighting & search snippets".

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

<property name="max" value="5"/>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses below.

  • metadataSortField (mandatory): The metadata field to sort on to retrieve the recent submissions
  • max (mandatory): The maximum number of results to be displayed as recent submissions
  • type (optional): the type of the search filter. It can either be date or text, if none is defined text will be used.

Customizing hit highlighting & search snippets

The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.

Warning

This section only applies to XMLUI. JSPUI does not currently support "highlighting & search snippets".


Info
titleDisabling hit highlighting / search snippets

You can disable hit highlighting / search snippets by commenting out the entire <property name="hitHighlightingConfiguration"> Configuration in the [dspace]/config/spring/api/discovery.xml configuration file.

PLEASE BE AWARE there are two sections where this <property> definition exists. You should comment out both. One is under the <bean id="defaultConfiguration"> and one is under the <bean id="homepageConfiguration">

Alternatively, you may also choose to tweak which fields are shown in hit highlighting, or modify the number of matching words shown (snippets) and/or number of characters shown around the matching word (maxSize).

For this change to take effect in the User Interface, you will need to restart Tomcat.


Notewarning

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

...

Code Block
languagehtml/xml
<property name="hitHighlightingConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
        <property name="metadataFields">
            <list>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.title"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.contributor.author"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.subject"/>
                    <property name="snippets" value="5"/>
                </bean>
                <property<bean nameclass="field" value="dc.subject"/org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="snippetsfield" value="5dc.description.abstract"/>
                  </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"  <!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="fieldmaxSize" value="dc.description.abstract250"/>
                    <property name="maxSize" value="250"/<!-- Max number of snippets (matching words) to show -->
                    <property name="snippets" value="2"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
					<!-- Displays snippets from indexed full text of document (for supported formats) -->
                    <property name="field" value="fulltext"/"fulltext"/>
                    <!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
                    <property name="maxSize" value="250"/>
					<!-- Max number of snippets (matching words) to show -->
                    <property name="snippets" value="2"/>
                </bean>
            </list>
        </property>
    </bean>
</property>

...

Command used:

[dspace]/bin/dspace index-discovery [-cbhf[r <item handle>]]

Java class:

org.dspace.discovery.IndexClient

Arguments (short and long forms):

Description

 

called without any options, will update/clean an existing index

-b

(re)build index, wiping out current one if it exists

-c

clean existing index removing any documents that no longer exist in the db

-f

if updating existing index, force each handle to be reindexed even if uptodate

-h

print this help message

-i <object handle>Reindex an individual object (and any child objects).  When run on an Item, it just reindexes that single Item. When run on a Collection, it reindexes the Collection itself and all Items in that Collection. When run on a Community, it reindexes the Community itself and all sub-Communities, contained Collections and contained Items.

-o

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

-sRebuild the spellchecker, can be combined with -b and -f.

...