All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
...
Watch the DSpace Discovery introduction video
Info |
---|
Since DSpace 46.0, Discovery is the default the only out-of-the-box Search and Browse infrastructure for both XMLUI and JSPUIprovided in DSpace. |
From the user perspective, faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets.
...
This is a classic "tag cloud" facet in a DSpace repository.
The legacy search engine (based on Apache Lucene) and legacy Browse system (based on database tables) have been removed from DSpace 6.0 or above. Instead, DSpace now only uses Discovery for all Search/Browse capabilities.
In addition, to support the new Configuration options, all of the Discovery configurations in discovery.cfg
have been prefixed with "discovery.
" (see configuration below).
The new JSPUI-only tag cloud facet feature is disabled by default. In order to enable it, you will need to set up the corresponding processor that the PluginManager will load to actually perform the tag cloud query on the relevant pages. This is configured in the dspace.cfg configuration file using the following properties:
...
...
Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.
If you have upgraded from an older version of DSpace, your database may still include outdated "bi_*" tables (where "bi" = "browse index"). When Discovery is enabled, these tables are no longer necessary, as Discovery takes over this browse index function.
To clean up all these old "bi_*" tables, simply run:
Code Block |
---|
[dspace]/bin/dspace index-db-browse -f -d |
The configuration for discovery is located in 2 separate files.
discovery.cfg
file located in the [dspace-install-dir]/config/modules directory
.discovery.xml
file is located in [dspace-install-dir]/config/spring/api/
directory.config/modules/discovery.cfg
)The discovery.cfg
file is located in the [dspace-install-dir]/config/modules
directory and contains following properties:
The configuration for discovery is located in 2 separate files.
discovery.cfg
file located in the [dspace-install-dir]/config/modules directory
.discovery.xml
file is located in [dspace-install-dir]/config/spring/api/
directory.config/modules/discovery.cfg
)The discovery.cfg
file is located in the [dspace]/config/modules
directory and contains following properties. Any of these properties may be overridden in your local.cfg
(see Configuration Reference):
Property: | discovery.search.server | ||
Example Value: |
| ||
Informational Note: | Discovery relies on | ||
Property: | search.server | ||
Example Value: |
| ||
Informational Note: | Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index. If you are uncertain whether this property is set correctly, you can use a commandline tool like "wget" to perform a query against the Solr index (and ensure Solr responds). For example, the below query searches the Solr index for "test" and returns the response on standard out:
| ||
Property: | discovery.index. | Property: | index.authority.ignore[.field] |
Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. | ||
Property: | discovery.index.authority.ignore-prefered[.field] | ||
Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to query the authority for the prefered preferred label. Setting this property to false will make the indexing process the same as the metadata doesn't include authority information (i.e. the prefered preferred form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance. | ||
Property: | discovery.index.authority.ignore-variants[.field] | ||
Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If authority is a remote service, disabling this feature can greatly improve performance. |
...
The discovery.xml
file is located in the [dspace-install-dir]/config/spring/api
directory.
...
The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.
Warning |
---|
This paragraph section only applies to XMLUI. JSPUI does not currently support "highlighting & search snippets". |
Info |
---|
The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit highlighting.
Warning |
---|
Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext. If additional fields are required, look for the "itemSummaryList" template. |
Below is an example configuration of hit highlighting.
| ||
You can disable hit highlighting / search snippets by commenting out the entire PLEASE BE AWARE there are two sections where this <property> definition exists. You should comment out both. One is under the Alternatively, you may also choose to tweak which fields are shown in hit highlighting, or modify the number of matching words shown (snippets) and/or number of characters shown around the matching word (maxSize). For this change to take effect in the User Interface, you will need to restart Tomcat. |
Note |
---|
Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext. If additional fields are required, look for the "itemSummaryList" template. |
Below is an example configuration of hit highlighting.
Code Block | ||
---|---|---|
| ||
Code Block | ||
| ||
<property name="hitHighlightingConfiguration"> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration"> <property name="metadataFields"> <list> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"> <property name="field" value="dc.title"/> <property name="snippets" value="5"/> </bean> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"> <property name="field" value="dc.contributor.author"/> <property name="snippets" value="5"/> </bean> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"> <property name="field" value="dc.subject"/> <property name="snippets" value="5"/> </bean> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"> <property name="field" value="dc.description.abstract"/> <property name="maxSize" value="250"/> <!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) --> <property name="snippetsmaxSize" value="2250"/> </bean> <!-- Max number of snippets (matching words) to <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"show --> <property name="fieldsnippets" value="fulltext2"/> <property name="maxSize" value="250"/></bean> <property name="snippets" value="2"/> <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration"> <!-- Displays snippets from indexed full text of document (for supported formats) --> </bean><property name="field" value="fulltext"/> </list> </property> </bean> </property> |
The property name & the bean class are mandatory. The property field names are:
*
if all the metadata fields should be highlighted).The org.dspace.discovery.DiscoveryQuery object has a setter & getter for the hit highlighting configuration set in Discovery configuration. If this configuration is given the resolveToSolrQuery method located in the org.dspace.discovery.SolrServiceImpl class will use the standard Solr highlighting feature (http://wiki.apache.org/solr/HighlightingParameters). The org.dspace.discovery.DiscoverResult class has a method to set the highlighted fields for each object & field.
The rendering of search results is no longer handled by the METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains hit higlighting the Java code will split up the string & add DRI highlights to the list. The XSL for the themes also contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For themes using the old themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".
Warning |
---|
This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this" feature. |
The "more like this"-configuration element contains all the settings for displaying related items on an item display page.
Below is an example of the "more like this" configuration.
!-- Max number of characters to display around the matching word (Warning setting to 0 returns entire field) -->
<property name="maxSize" value="250"/>
<!-- Max number of snippets (matching words) to show -->
<property name="snippets" value="2"/>
</bean>
</list>
</property>
</bean>
</property> |
The property name & the bean class are mandatory. The property field names are:
*
if all the metadata fields should be highlighted).The org.dspace.discovery.DiscoveryQuery object has a setter & getter for the hit highlighting configuration set in Discovery configuration. If this configuration is given the resolveToSolrQuery method located in the org.dspace.discovery.SolrServiceImpl class will use the standard Solr highlighting feature (http://wiki.apache.org/solr/HighlightingParameters). The org.dspace.discovery.DiscoverResult class has a method to set the highlighted fields for each object & field.
The rendering of search results is no longer handled by the METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains hit higlighting the Java code will split up the string & add DRI highlights to the list. The XSL for the themes also contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For themes using the old themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".
Warning |
---|
This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this" feature. |
The "more like this"-configuration element contains all the settings for displaying related items on an item display page.
Below is an example of the "more like this" configuration.
Code Block | ||
---|---|---|
| ||
<property name="moreLikeThisConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
<property name="similarityMetadataFields">
| ||
Code Block | ||
| ||
<property name="moreLikeThisConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
<property name="similarityMetadataFields">
<list>
<value>dc.title</value>
<value>dc.contributor.author</value>
<value>dc.creator</value>
<value>dc.subject</value>
</list>
</property>
<!--The minimum number of matching terms across the metadata fields above before an item is found as related -->
<property name="minTermFrequency" value="5"/>
<!--The maximum number of related items displayed-->
<property name="max" value="3"/>
<!--The minimum word length below which words will be ignored-->
<property name="minWordLength" value="5"/>
</bean>
</property> |
...
Class | Note |
---|---|
tagcloud | General class for the whole tagcloud |
tagcloud_1 | Specific tag class for tag of type 1 (baed based on score) |
tagcloud_2 | Specific tag class for tag of type 2 (baed based on score) |
tagcloud_3 | Specific tag class for tag of type 3 (baed based on score) |
[dspace]/
config/spring/api/discovery.xml
to remove the following line from the defaultConfiguration
and homepageConfiguration
beans (in the sidebarFacets
property):
Code Block | ||
---|---|---|
| ||
<ref bean="searchFilterContentInOriginalBundle"/> |
Then restart your servlet container.
Command used: |
| Command used: |
|
Java class: | org.dspace.discovery.IndexClient | ||
Arguments (short and long forms): | Description | ||
| called without any options, will update/clean an existing index | ||
| (re)build index, wiping out current one if it exists | ||
| clean existing index removing any documents that no longer exist in the db | ||
| if updating existing index, force each handle to be reindexed even if uptodate | ||
| print this help message | ||
- | ooptimize search core | ||
| remove an Item, Collection or Community from index based on its handle |
i <object handle> | Reindex an individual object (and any child objects). When run on an Item, it just reindexes that single Item. When run on a Collection, it reindexes the Collection itself and all Items in that Collection. When run on a Community, it reindexes the Community itself and all sub-Communities, contained Collections and contained Items. |
| optimize search core |
| remove an Item, Collection or Community from index based on its handle |
-s | Rebuild the spellchecker, can be combined with -b and -f. |
It is It is strongly recommended to run maintenance on the Discovery Solr index daily occasionally (from crontab or your system's scheduler), to prevent your servlet container from running out of memory:
[dspace]/bin/dspace index-discovery -o
(Since Solr 4, the underlying optimize operation has been discouraged as mostly unnecessary and renamed. See https://issues.apache.org/jira/browse/SOLR-3141).
Discovery is built as an application layer on top of the Solr open source enterprise search server. Therefore, Solr configuration can be applied to the Discovery is built as an application layer on top of the Solr open source enterprise search server. Therefore, Solr configuration can be applied to the Solr cores that are shipped with DSpace.
The DSpace Solr instance itself now currently runs two cores. One for collection DSpace Solr based "statistics", the other for Discovery Solr based "search".several cores (which means indexes in Solr parlance). The "statistics" core is for collection of DSpace usage events for statistical purposes (if you have been collecting statistics for multiple years, you may have chosen to use sharding and you will see one core per each year collected). The "search" core is used by Discovery for for search and faceting, for displaying the collection/community hierarchy and item counts. The "authority" core is used by SolrAuthority to store information about authors, including their data imported from the ORCID registry.
Code Block |
---|
solr
├── solr.xml
├── search
│ └── conf
│ ├── admin-extra.html
│ ├── elevate.xml
│ ├── protwords.txt
│ ├── schema.xml
│ ├── scripts.conf
│ ├── solrconfig.xml
│ ├── spellings.txt
│ ├── stopwords.txt
│ ├── synonyms.txt
│ └── xslt
│ ├── DRI.xsl
│ ├── example.xsl
│ ├── example_atom.xsl
│ ├── example_rss.xsl
│ └── luke.xsl
├── ...
└── statistics
└── conf
├── admin-extra.html
|
Code Block |
solr ├── search │ ├── conf │ │ ├── admin-extra.html │ │ ├── elevate.xml │ │ ├── protwords.txt │ │ ├── schema.xml │ │ ├── scripts.conf │ │ ├── solrconfig.xml │ │ ├── spellings.txt │ │ ├── stopwords.txt │ │ ├── synonyms.txt │ │ └── xslt │ │ ├── DRI.xsl │ │ ├── example.xsl │ │ ├── example_atom.xsl │ │ ├── example_rss.xsl │ │ └── luke.xsl │ └── conf2 ├── solr.xml └── statistics └── conf ├── admin-extra.html ├── elevate.xml ├── protwords.txt ├── schema.xml ├── scripts.conf ├── solrconfig.xml ├── spellings.txt ├── stopwords.txt ├── synonyms.txt └── xslt ├── example.xsl ├── example_atom.xsl ├── example_rss.xsl └── luke.xsl |
Discovery has its own messages.xml file, located at dspace-xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml. To add your own labels for new fields and facets in a Maven overlay, copy this file to dspace/modules/xmlui/src/main/resources/aspects/Discovery/i18n/messages.xml and modify this file. Alternatively, you may add them to the main messages.xml file. Same goes for translations - it's encouraged to submit a single messages_XX.xml file including messages from all the separate messages.xml files in DSpace.
Advanced search related keys (change "author" to desired field)
Filter name | xmlui.ArtifactBrowser.SimpleSearch.filter.author |
Facet heading | xmlui.ArtifactBrowser.AdvancedSearch.type_author |
"Filter by" page heading | xmlui.Discovery.AbstractSearch.type_author |