Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Another example: Using the standard search, a user would search for something like [wetland + "dc.author=Mitsch, William J" + dc.subject="water quality" ]. With filtered search, they can start by searching for [wetland ], and then filter the results by the other attributes, author and subject.

Discovery

...

Changelist

DSpace 1.7 

  • Sidebar browse facets that can be configured to use contents from any metadata field
    • Dynamically generated timespans for dates
  • Customizable "recent submissions" view on the repository homepage, collection and community pages
  • Hit highlighting & search snippets

DSpace 1.

...

  • Configuration moved from dspace.cfg into config/modules/discovery.cfg and config/spring/api/discovery.xml
  • Individual communities and collections can have their own Discovery configuration.
  • Tokenization for Auto-complete values (see SearchFilter)
  • Alphanumeric sorting for Sidebarfacets
  • Possibility to avoid indexation of specific metadata fields.
  • Grouping of multiple metadata fields under the same SidebarFacet

DSpace 3

...

.0 

Info
Starting from DSpace 3.0 discovery , Discovery is also supported in JSPUI.

...

  • Hierarchical facets sidebar facets
  • Improved & more intuitive user interface
  • Access rights based results
  • Authority control & variants aware awareness ( homonymous homonyms are shown separately in a facet if they have different authority ID). All variants form variant forms as know recognized by the authority framework are indexed. See Authority Framework

XMLUI-only:

  • Hit highlighting and search snippets support
  • "More like this" (related items)

Bugfixes and other changes

  • Auto-complete functionality has been removed in XMLUI from search queries due to performance issues. JSPUI still supports auto-complete functionality without performance issues.

Enabling Discovery

...

You can independently enable Discovery for XMLUI or JSPUI. Follow the steps below.

Steps required for

...

XMLUI

As with any upgrade procedure, it is highly recommend that you backup your existing data thoroughly. Although upgrades in versions of Solr/Lucene do tend to be forwards forward-compatible for the data stored in the Lucene index, it is always a best practice to backup your [dspace-install-dir]/solr/statistics cores to assure no data is lost.

  1. Enable the Discovery Aspects in the XMLUI by changing the following settings in config/xmlui.xconf
    1. Comment out: SearchArtifacts
    2. Uncomment: Discovery

      Code Block
      xml
      xml
      <xmlui>
          <aspects>
              <!--
                  @deprecated: the Artifact Browser has been devided into ViewArtifacts,
                  BrowseArtifacts, SearchArtifacts
                  <aspect name="Artifact Browser" path="resource://aspects/ArtifactBrowser/" />
              -->
              <aspect name="Displaying Artifacts" path="resource://aspects/ViewArtifacts/" />
              <aspect name="Browsing Artifacts" path="resource://aspects/BrowseArtifacts/" />
              <!--<aspect name="Searching Artifacts" path="resource://aspects/SearchArtifacts/" />-->
              <aspect name="Administration" path="resource://aspects/Administrative/" />
              <aspect name="E-Person" path="resource://aspects/EPerson/" />
              <aspect name="Submission and Workflow" path="resource://aspects/Submission/" />
      	<aspect name="Statistics" path="resource://aspects/Statistics/" />
      
              <!--
                  To enable Discovery, uncomment this Aspect that will enable it
                  within your existing XMLUI
                  Also make sure to comment the SearchArtifacts aspect
                  as leaving it on together with discovery will cause UI overlap issues-->
              <aspect name="Discovery" path="resource://aspects/Discovery/" />
      
      
              <!--
                  This aspect tests the various possible DRI features,
                  it helps a theme developer create themes
              -->
              <!-- <aspect name="XML Tests" path="resource://aspects/XMLTest/"/> -->
          </aspects>
      
  2. Enable the Discovery Indexing Consumer that will update Discovery Indexes on changes to content in XMLUI, JSPUI, SWORD, and LNI in config/dspace.cfg
    1. Add discovery to the list of event.dispatcher.default.consumers

      Code Block
      # default synchronous dispatcher (same behavior as traditional DSpace)
      event.dispatcher.default.class = org.dspace.event.BasicDispatcher
      #event.dispatcher.default.consumers = versioning, search, browse, eperson, harvester
      event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester
      
    2. Change recent.submissions.count to zero

      Code Block
      #Put the recent submissions count to 0 so that discovery can use it's recent submissions,
      # not doing this when discovery is enabled will cause UI overlap issues
      #How many recent submissions should be displayed at any one time
      #recent.submissions.count = 5
      recent.submissions.count = 0
      
  3. Check that the port is correct for solr.search.server in config/modules/discovery.cfg
    1. If all of your traffic runs over port 80, then you need to remove the port from the URL

      Code Block
      ##### Search Indexing #####
      solr.search.server = http://localhost/solr/search
      
  4. From the command line, navigate to the [dspace] directory and run the command below to index the content of your DSpace instance into Discovery.

    Code Block
    .[dspace]/bin/dspace update-discovery-index
    
    Panel

    NOTE: This step may take some time if you have a large number of items in your repository.

  5. Verify that you can see the Sidebar Facets on your DSpace homepage. Note that these are only visible when you have items in your repository.

...

Steps required for

...

JSPUI

As with any upgrade procedure, it is highly recommend that you backup your existing data thoroughly. Although upgrades in versions of Solr/Lucene do tend to be forwards compatible for the data stored in the Lucene index, it is always a best practice to backup your [dspace-install-dir]/solr/statistics cores to assure ensure no data is lost.

  1. Enable the Discovery Search processor in the by changing the following settings in config/dspace.cfg
    1. Comment out: org.dspace.app.webui.search.LuceneSearchRequestProcessor
    2. Uncomment: org.dspace.app.webui.discovery.DiscoverySearchRequestProcessor

      Code Block
      xml
      xml
      plugin.single.org.dspace.app.webui.search.SearchRequestProcessor = \
              org.dspace.app.webui.discovery.DiscoverySearchRequestProcessor
      
  2. Enable the Discovery Indexing Consumer that will update Discovery Indexes on changes to content in XMLUI, JSPUI, SWORD, and LNI in config/dspace.cfg
    1. Add discovery to the list of event.dispatcher.default.consumers

      Code Block
      # default synchronous dispatcher (same behavior as traditional DSpace)
      event.dispatcher.default.class = org.dspace.event.BasicDispatcher
      #event.dispatcher.default.consumers = versioning, search, browse, eperson, harvester
      event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester
      
      Note

      As it is not possible in JSPUI to use both search provider providers (Lucene and Discovery),  it  it is generally more appropriate, but not required, to remove the "search" consumer from the list above. The "browse" consumer can be removed as well if you configure the Browse System to use SOLRSolr/Discovery as its backend (see Defining the Storage of the Browse Data)

    2. Enable facet showing in the Repository, Communities and Collections home pages

      Code Block
      plugin.sequence.org.dspace.plugin.CommunityHomeProcessor = \
              org.dspace.app.webui.components.RecentCommunitySubmissions,\
              org.dspace.app.webui.discovery.SideBarFacetProcessor
      
      plugin.sequence.org.dspace.plugin.CollectionHomeProcessor = \
              org.dspace.app.webui.components.RecentCollectionSubmissions,\
              org.dspace.app.webui.discovery.SideBarFacetProcessor
      
      plugin.sequence.org.dspace.plugin.SiteHomeProcessor = \
              org.dspace.app.webui.discovery.SideBarFacetProcessor
      
      Note

      Please note that JSPUI (in contrast to XMLUI) still relies on the Browse Engine to show "recent submissions". The browse engine can be configured to use SOLRSolr/Discovery as its backend (see Defining the Storage of the Browse Data)

    3. Enable a JSON endpoint to provide the autocompletion feature in the search form

      Code Block
      plugin.named.org.dspace.app.webui.json.JSONRequest = \
          org.dspace.app.webui.discovery.DiscoveryJSONRequest = discovery
  3. Check that the port is correct for solr.search.server in config/modules/discovery.cfg
    1. If all of your traffic runs over port 80, then you need to remove the port from the URL

      Code Block
      ##### Search Indexing #####
      solr.search.server = http://localhost/solr/search
      
  4. From the command line, navigate to the [dspace] directory and run the command below to index the content of your DSpace instance into Discovery.

    Code Block
    ./bin/dspace update-discovery-index
    
    Panel

    NOTE: This step may take some time if you have a large number of items in your repository.

  5. Verify that you can see the Sidebar Facets on your DSpace homepage or that an empty search query will return all repository content. Note that these are only visible when you have items in your repository.

...

Property:

search.server

Example Value:

search.server=[http://localhost:8080/solr/search]

Informational Note:

Discovery relies on a SOLR Solr index for storage and retrieval of its information. This parameter determines the location of the SOLR Solr index.

Property:

index.ignore

Example Value:

index.ignore=dc.description.provenance,dc.language

Informational Note:

By default, Discovery will include all of the DSpace metadata in its search index. In cases where specific metadata is confidential, repository managers can include those fields by adding them to this comma separated list.

Property:

index.authority.ignore[.field]

Example Value:

index.authority.ignore=true

index.authority.ignore.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to disambiguate homonymoushomonyms. Setting this property to false will make the indexing process the same as the metadata doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value.

Property:

index.authority.ignore-prefered[.field]

Example Value:

index.authority.ignore-prefered=true

index.authority.ignore-prefered.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for the prefered label. Setting this property to false will make the indexing process the same as the metadata  doesn't include authority information (i.e. the prefered form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance.

Property:

index.authority.ignore-variants[.field]

Example   Value:

index.authority.ignore-variants=true

index.authority.ignore-variants.dc.contributor.author=false

Informational Note:

By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata  doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance.

...

Class:

DiscoveryConfigurationService

Purpose:

Defines the mapping between separate Discovery configurations and individual collections/communities

Default:

All communities, collections and the homepage (key=default) are mapped to defaultConfiguration

Class:

DiscoveryConfiguration

Purpose:

Groups configurations for sidebar facets, search filters, search sort options and recent submissions

Default:

There is one configuration by default called defaultConfiguration

Class:

DiscoverySearchFilter

Purpose:

Defines that specific metadata fields should be enabled as a search filter

Default:

dc.title, dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued are defined as search filters

Class:

DiscoverySearchFilterFacet

Purpose:

Defines which metadata fields should be offered as a contextual sidebar browse options, each of these facets has also got to be a search filter

Default:

dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued

Class:

HierarchicalSidebarFacetConfiguration

Purpose:

Defines which metadata fields contain hierarchical data and should be offered as a contextual sidebar option

Class:

DiscoverySortConfiguration

Purpose:

Further specifies the sort options to which a DiscoveryConfiguration refers

Default:

dc.title and dc.date.issued are defined as alternatives for sorting, other than Relevance (hard-coded)

Class:

DiscoveryHitHighlightingConfiguration

Purpose:

Defines which metadata fields can contain hit highlighting & search snippets

Default:

dc.title, dc.contributor.author, dc.subject, dc.description.abstract & full text from text files.

...

In addition to the summarized descriptions of the default values, following details help you to better understand these defaults. If you haven't yetalready done so, download the configuration file and review it together with the following parameters.
The file contains one default configuration that defines following sidebar facets, search filters, sort fields and recent submissions display:

  • Sidebar facets
    • searchFilterAuthor:  groups  groups the metadata fields dc.contributor.author & dc.creator with a facet limit of 10, sorted by occurrence count
    • searchFilterSubject: groups all subject metadata fields (dc.subject.*) with a facet limit of 10, sorted by occurrence count
    • searchFilterIssued: contains the dc.date.issued metadata field, which is identified with the type "date" and sorted by specific date values
  • Search filters
    • searchFilterTitle: contains the dc.title metadata field
    • searchFilterAuthor: contains the dc.contributor.author & dc.creator metadata fields
    • searchFilterSubject: contains the dc.subject.* metadata fields
    • searchFilterIssued: contains the dc.date.issued metadata field with the type "date"
  • Sort fields
    • sortTitle: contains the dc.title metadata field
    • sortDateIssued: contains the dc.date.issued metadata field, this sort has the type date configured.
  • defaultFilterQueries
    • The default configuration contains no defaultFilterQueries
    • The default filter queries are disabled by default but there is an example in the default configuration in comments which allows discovery to only return items (as opposed to also communities/collections).
  • Recent Submissions
    • The recent submissions are sorted by dc.date. accessioned which is a date and a maximum number of 5 recent submissions are displayed.
  • Hit highlighting
    • The fields dc.title, dc.contributor.author & dc.subject can contain hit highlighting.
    • The dc.description.abstract & full text field are used to render search snippets.

...

  • indexFieldName (Required): A unique search filter name, the metadata will be indexed in SOLR under Solr under this field name.
  • metadataFields (Required): A list of the metadata fields that need to be included in the facet.

...

  • facetLimit (optional): The maximum number of values to be shown. This property is optional, if none is specified the default value '"10' " will be used. If the filter has the type date, this property will not be used since dates are automatically grouped together.
  • sortOrder (optional):The sort order for the sidebar facets, it can either be COUNT or VALUE. The default value is COUNT.
    • COUNT Facets will be sorted by the amount of times they appear in the repository
    • VALUE Facets will be sorted alphabetically
  • type(optional): the type of the sidebar facet it can either be '"date' " or '"text'",  '"text' " is the default value.
    • text: The facets will be treated as is
    • date: Only the year will be stored in the SOLR indexSolr index. These years are automatically displayed in ranges that get smaller when you select one.

...

  • The list of applicable sidebarFacets
  • The list of applicable searchFilters
  • The list of applicable searchSortFields
  • Any default filter queries (optional)
  • The configuration for the Recent submissions display

...

  • the Recent submissions display

Configuring lists of sidebarFacets and searchFilters

Note

After modifying sidebarFacets and searchFilters, don't forget to reindex existing items by running [dspace]/bin/dspace update-discovery-index -b, otherwise the changes will not appear.

 

Below is an example of how one of these lists can be configured. It's important that each of the bean references corresponds to the exact name of the earlier defined facets, filters or sort options.

...

Warning

The Browse Engine only supports the "Access item based results" if the SOLRSolr/Discovery backend is enabled (see Defining the Storage of the Browse Data)

...

Warning

This paragraph only apply applies to XMLUI. The JSPUI relies on the Browse Engine to show "recent submissions". This requires that the SOLRSolr/Discovery backend is enabled (see Defining the Storage of the Browse Data).

...

Warning

This paragraph only apply applies to XMLUI. The JSPUI does not currently support "highlighting & search snippets".

The hit highlighting configuration element contains all the settings necessary to display search snippets & enable hit highlighting.

Warning

Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.

If additional fields are required, look for the "itemSummaryList" template.

Below is an example configuration of the hit highlighting.

Code Block
languagehtml/xml
<property name="hitHighlightingConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
        <property name="metadataFields">
            <list>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.title"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.contributor.author"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.subject"/>
                    <property name="snippets" value="5"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="dc.description.abstract"/>
                    <property name="maxSize" value="250"/>
                    <property name="snippets" value="2"/>
                </bean>
                <bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
                    <property name="field" value="fulltext"/>
                    <property name="maxSize" value="250"/>
                    <property name="snippets" value="2"/>
                </bean>
            </list>
        </property>
    </bean>
</property>

...

The org.dspace.discovery.DiscoveryQuery object has a setter & getter for the hit highlighting configuration configured set in the discovery Discovery configuration. If this configuration is given the resolveToSolrQuery method located in the org.dspace.discovery.SolrServiceImpl class will use the standard solr Solr highlighting feature (http://wiki.apache.org/solr/HighlightingParameters). The org.dspace.discovery.DiscoverResult class has a method to set the highlighted fields for each object & field.

The rendering of search results is no longer handled by the mets METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains hit higlighting the java Java code will split up the string & add DRI highlights to the list. The xsl XSL for the themes also contains special rendering xsl XSL for the DRI, ; for Mirage, the changes have been are located in the discovery.xsl file. For themes using the old themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".

"More like this" configuration

Warning

This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this" feature.

The '"more like this'"-configuration element contains all the settings for displaying the related items on an item display page.
Below is an example of the "more like this" configuration.

Code Block
languagehtml/xml
<property name="moreLikeThisConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
        <property name="similarityMetadataFields">
            <list>
                <value>dc.contributor.author</value>
                <value>dc.creator</value>
                <value>dc.subject</value>
            </list>
        </property>
        <!--The minimum number of matching terms accrossacross the metadata fields above before an item is found as related -->
        <property name="minTermFrequency" value="5"/>
        <!--The maximum number of related items displayed-->
        <property name="max" value="3"/>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses discussed below.

  • similarityMetadataFields: the metadata fields checked for similarity
  • minTermFrequency: The minimum number of matching terms accross the metadata fields above before an item is found as related
  • max: The maximum number of related items displayed
"More like this" technical details

The org.dspace.discovery.SearchService object has received a getRelatedItems() method. This method requires an item & the more-like-this configuration bean from above. This method is implemented in the org.dspace.discovery.SolrServiceImpl which uses the item as a query & uses the default Solr parameters for more-like-this to pass the bean configuration to solr (http://wiki.apache.org/solr/MoreLikeThis). The result will be a list of items or if none found an empty list. The rendering of this list is handled in the org.dspace.app.xmlui.aspect.discovery.RelatedItems class.

Discovery

...

Solr Index Maintenance

Command used:

[dspace]/bin/dspace update-discovery-index [-cbhf[r <item handle>]]

Java class:

org.dspace.discovery.IndexClient

Arguments (short and long forms):

Description

 

called without any options, will update/clean an existing index

-b

(re)build index, wiping out current one if it exists

-c

clean existing index removing any documents that no longer exist in the db

-f

if updating existing index, force each handle to be reindexed even if uptodate

-h

print this help message

-o

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

Routine Discovery

...

Solr Index Maintenance

It is strongly recommended to run maintenance on the Discovery SOLR Solr index daily (from crontab or your system's scheduler), to prevent your servlet container from running out of memory:

[dspace]/bin/dspace update-discovery-index -o

Advanced

...

Solr Configuration

Discovery is built as an application layer on top of the Open Source Enterprise Search Server SOLR. Therefor, SOLR configuration the Solr open source enterprise search server. Therefore, Solr configuration can be applied to the SOLR cores Solr cores that are shipped with DSpace.
The DSpace SOLR Solr instance itself now runs two cores. One for collection DSpace Solr based "statistics", the other for Discovery Solr based "search".

...