Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Solr in DSpace

What is Solr: http://lucene.apache.org/solr/features.html

DSpace uses Solr as a part of Discovery as index to speed up access to content metadata and data about access to DSpace (for statistics). It also provides faceting and search results filtering. If Discovery is enabled, the DSpace search field accepts Solr search syntax.
Discovery is an optional part of DSpace since 1.7 (with big improvements and configuration format changes in 1.8). When enabled, Discovery replaces DSpace Search and Browse and provides Solr-based statistics.

Connecting to Solr

By default, the DSpace Solr server is configured to listen only on localhost port 8080. That means that you cannot connect from another machine to the dspace server port 8080 and request a Solr URL - you'll get a HTTP 403 error. This configuration was done for security considerations - Solr index contains some data that is not accessible via public DSpace interfaces and some of the data might be sensitive.

While you could make Solr publicly accessible by changing this default configuration (if you want to do so, search for LocalHostRestrictionFilter), this is not recommended. Instead, use one of following simple means to bypass this restriction temporarily. All of them will make Solr accessible only to the machine you're connecting from for as long as the connection is open.

  1. OpenSSH client - port forwarding
    connect to DSpace server and forward its port 8080 to localhost (machine we're connecting from) port 1234
    makes mydspace.edu:8080 accessible via localhost:1234 (type http://localhost:1234 in browser address bar)
    ssh -L 1234:127.0.0.1:8080 mydspace.edu
    exit ssh to terminate port forwarding
    ssh -N -f -L 1234:127.0.0.1:8080 mydspace.edu
    run with -N and -f flags if you want ssh to go to background; kill the ssh process to terminate port forwarding
  2. Putty client - port forwarding
    The same with Putty:
    Connection - SSH - Tunnels
    Source port: 8080
    Destination: localhost:1234
    Local
    Auto
    Add
    
  3. OpenSSH client - SOCKS proxy
    connect to DSpace server and run a SOCKS proxy server on localhost port 1234; configure browser to use localhost:1234 as SOCKS proxy
    all browser requests now originate from dspace server (source IP is dspace server's IP) - dspace is the proxy server
    type http://localhost:8080 in browser address bar - localhost here is the dspace server
    ssh -D 1234 mydspace.edu

Accessing Solr

Solr cores

DSpace contains a so-called multicore installation of Solr. That means that there are multiple Solr indexes and configurations sharing one Solr codebase. If you're familiar with Apache HTTPD, it is analogous to multiple virtual hosts running on one Apache server (separate configuration and webpages), except that individual Solr cores are accessible via different URL (as opposed to virtualhost IP:port).

The two Solr instances in DSpace Discovery are called "search" and "statistics". search contains data about communities, collections, items and bitstreams. statistics contains data about searches, accessing users, IPs etc. The two instances are accessible at following URLs (relative to the dspace server):

http://localhost:8080/solr/search/
http://localhost:8080/solr/statistics/

Solr admin interface

Both Solr cores have separate administration interfaces which let you view thier respective schemas, configurations, set up logging and submit queries. The schema browser here is very useful to list fields (and their types) included in each index and even see an overview of most common values of individual fields with their frequency.

http://localhost:8080/solr/search/admin/
http://localhost:8080/solr/statistics/admin/

Solr queries

The base URL of the default Solr search handler is as follows:

http://localhost:8080/solr/search/search
http://localhost:8080/solr/statistics/search

Using the knowledge of particular fields from Solr Admin and Solr syntax (SolrQuerySyntax, CommonQueryParameters) you can make your own search requests.
You can also look at the Tomcat log file to see

tail -f /var/log/tomcat6/catalina.out

(depending on your Tomcat installation method, the path may be different)

Solr responses

By default, Solr responses are returned in XML format. However, Solr can provide several other output formats including JSON and CSV. Discovery uses the javabin format. The Solr request parameter is wt (e.g. &wt=json). For more information, see Response Writers, QueryResponseWriters.
An interesting option is to specify an XSLT stylesheet that can transform the XML response (server-side) to any format you choose, typically HTML. The .xsl files must be provided in the dspace/solr/search/conf/xslt/ directory. Append &wt=xslt&tr=example.xsl to the Solr request URL.
For more information, see XsltResponseWriter.

Examples

Date of last deposited item

To get all items (search.resourcetype:2) sorted by date accessioned (dc.date.accessioned_dt) in order from newest to oldest (desc):

http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc

Note:
search.resourcetype:2 — items
search.resourcetype:3 — communities
search.resourcetype:4 — collections

To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):

http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc&rows=1&fl=dc.date.accessioned&omitHeader=true

Top downloaded items by a specific user

http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0

Note:
facet.field=epersonid — You want to group by epersonid, which is the user id.
type:0 — Interested in bitstreams only

Guidepost

Other pages on this wiki describing Solr and Discovery.

See also:

  • No labels