Date: Thu, 28 Mar 2024 07:51:10 -0400 (EDT) Message-ID: <673682152.27590.1711626670670@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_27589_1149576330.1711626670669" ------=_Part_27589_1149576330.1711626670669 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
This page is based on the deprecated "fcrepo-message-consumer". It will = be updated to use the equivalent capabilities offered by "fcrepo-camel-tool= box".
See:
One of the major goals of this event-based indexing approach is to reduc= e the impact of indexing on core repository functionality. The reposi= tory just creates a JMS event (containing only the resource identifier= and the event type, which are already in memory), and does not need to do = any extra work for indexing before moving on to its next task. When r= epository updates happen at a faster rate than the indexer can match, JMS e= vents can wait in the queue until the indexer catches up, and the updates c= an continue without waiting. When processing large batches of updates= , you can even disable the indexer.
The indexer can have any number of workers configured to process the eve= nts. So the main indexer process retrieves the resource RDF from the = repository, and that content can be reused by multiple workers. If yo= u want to process the events in several ways (triplestore, Solr, archive to= disk, update remote repository, etc.), this limits the number of times the= metadata has to be retrieved from the repository to once each time the res= ource is updated.
Several different indexer modules exist for syncing with different syste= ms:
The indexer is configured using Spring. Here is a sample configura= tion fragment showing three workers (saving RDF to disk, persisting jcr/xml= , and syncing to a Jena Fuseki triplestore) and the framework for listening= to events and connecting them with the workers:
<!-- Worker #1: Copy resource RDF to a Fuseki triplestore using S= PARQL Update --> <bean id=3D"sparqlUpdate" class=3D"org.fcrepo.indexer.SparqlIndexer"&g= t; <!-- base URL for triplestore subjects, PID will be appended --> <property name=3D"prefix" value=3D"http://localhost:${test.port:8080= }/rest/objects/"/> <property name=3D"queryBase" value=3D"http://localhost:3030/test/que= ry"/> <property name=3D"updateBase" value=3D"http://localhost:3030/test/up= date"/> <property name=3D"formUpdates"> <value type=3D"java.lang.Boolean">false</value> </property> </bean> <!-- Worker #2: Save resource RDF to timestamped files on disk --> <bean id=3D"fileSerializer" class=3D"org.fcrepo.indexer.FileSerializer= "> <property name=3D"path" value=3D"./target/test-classes/fileSerialize= r/"/> </bean> <!-- jcr/xml persistence Indexer --> <bean id=3D"jcrXmlPersist" class=3D"org.fcrepo.indexer.persistence.Jcr= XmlPersistenceIndexer"> <constructor-arg value=3D"${fcrepo.jcrxml.storage:fcrepo4-jcrxml}" /= > </bean> <!-- Main indexer class that processes events, gets RDF from the repos= itory and calls the workers --> <bean id=3D"indexerGroup" class=3D"org.fcrepo.indexer.IndexerGroup">= ; <constructor-arg name=3D"repositoryURL" value=3D"http://${fcrepo.hos= t:localhost}:${fcrepo.port:8080}${fcrepo.context:/}rest" /> <constructor-arg name=3D"indexers"> <set> <ref bean=3D"jcrXmlPersist"/> <ref bean=3D"fileSerializer"/> <ref bean=3D"sparqlUpdate"/> </set> </constructor-arg> <!-- If your Fedora instance requires authentication, enter the cred= entials here. Leave blank if your repo is open. --> <constructor-arg name=3D"fedoraUsername" value=3D"${fcrepo.username:= }" /> <constructor-arg name=3D"fedoraPassword" value=3D"${fcrepo.password:= }" /> </bean> <!-- ActiveMQ queue to listen for events --> <bean id=3D"destination" class=3D"org.apache.activemq.command.ActiveMQ= Topic"> <constructor-arg value=3D"fedora" /> </bean> <!-- Message listener container to connect the JMS queue to the indexe= r --> <bean id=3D"jmsContainer" class=3D"org.springframework.jms.listener.De= faultMessageListenerContainer"> <property name=3D"connectionFactory" ref=3D"connectionFactory"/> <property name=3D"destination" ref=3D"destination"/> <property name=3D"messageListener" ref=3D"indexerGroup" /> <property name=3D"sessionTransacted" value=3D"true"/> </bean>
To use another triplestore, change the SparqlIndexer bean configuration.= Here is the bean configuration to use with Sesame running on port 80= 81:
<!-- Worker #1: Copy resource RDF to a Sesame triplestore using S= PARQL Update --> <bean id=3D"sparqlUpdate" class=3D"org.fcrepo.indexer.SparqlIndexer"&g= t; <!-- base URL for triplestore subjects, PID will be appended --> <property name=3D"prefix" value=3D"http://localhost:${test.port:8080= }/rest/objects/"/> <property name=3D"queryBase" value=3D"http://localhost:8081/ope= nrdf-sesame/repositories/test"/> <property name=3D"updateBase" value=3D"http://localhost:8081/openrdf= -sesame/repositories/test/statements"/> <property name=3D"formUpdates"> <value type=3D"java.lang.Boolean">true</value> </property> </bean>
To implement a new kind of indexer:
To get hands-on experience with the indexer and see updates synced with = an external triplestore, you need three components. Each component will pot= entially run in its own application container. The three components are:
The triplestore and Fedora4 do not need to be aware of each other or of = the JMS listener. However, the event-listener needs to know the web-endpoin= ts of both the triplestore and Fedora 4. It is therefore important that you= start the three components on different ports.
Instructions on how to start up and configure the three components follo= ws:
You can deploy Fedora4 either by downl= oading the latest war file and dropping it into an application containe= r (e.g. Tomcat7). Or you can clone the Git fcrepo4 project= and run the fcrepo-webapp directly within the code base.
See the following pages for details on either approach:
You can deploy the JMS event listener/indexer by downloading the latest war file and dropping it into an appl= ication container (e.g. Tomcat 7). Or you can clone the fcrepo-message-consumer project and run the fcrepo-messag= e-consumer-pluggable directly within the code base. Building the proje= ct from source will likely make it easier to configure the JMS event listen= er/indexer.
You can specify the connection to either Fuseki or Sesame in the followi= ng configuration file.
To configure the JMS indexer to connect to the Fedora Repository, you ca= n set the following system variables
-Dfcrep= o.host=3D<defaults.to.localhost> -Dfcrepo.port=3D<defaults.to.8080>
To configure the JMS indexer to connect to the triplestore, you ca= n set the following system variables
-Dfusek= i.host=3D<defaults.to.localhost>=20 -Dfuseki.port=3D<defaults.to.3030>
... or if you are using Sesame:
-Dsesam= e.host=3D<defaults.to.localhost> -Dsesame.port=3D<defaults.to.8081>
Finally, you will potentially need to set the output directory for the F= ileSerializer (which is a testing class for showing what is being indexed)<= /p>
-Dfile.= serializer.dir=3D<defaults.to.webcontainer.target>
Below is an example of how to download, build, and start the JMS i= ndexer.
$ git c= lone https://github.com/fcrepo4/fcrepo-message-consumer.git =20 $ cd fcrepo-message-consumer $ mvn install $ cd fcrepo-message-consumer $ mvn -Dfcrepo.host=3Dlocalhost -Dfcrepo.port=3D8080 -Dfuseki.host=3Dlocalh= ost -Dfuseki.port=3D3030 -Djetty.port=3D8082 jetty:run
If the Fedora Rep= ository is be running at http://localhost:8080/rest/ =E2=80=93 you can create, update and delete = resources using your browser, or using the REST API (see SPARQL Recipes ). Each event will trigger the ind= exer and be synced to Fuseki (or Sesame), which you can access at http://localhost:3030= / (if you = have Fuseki running on its default port).
If you have a repository with existing content that you want to index, o= r have changed your indexing logic and want to reindex content, you can use= the reindex REST API call in the indexer webapp.
To reindex the resource http://localhost:8080/rest/objects/
=
and all of its children:
$ curl = -X POST -d baseURI=3Dhttp://localhost:8080/rest/objects/ http://localhost:8= 082/reindex
To reindex just the resource http://localhost:8080/rest/objects/fo=
o/
, but not recursively reindex its children, add the recursiv=
e=3Dfalse
parameter:
$ curl = -X POST -d baseURI=3Dhttp://localhost:8080/rest/objects/ -d recursive=3Dfal= se http://localhost:8082/reindex
In some situations it is desirable to have multiple Fedora repositories = all feeding into a single external triplestore. In order to accomplish this= , we need to install and setup the three components (Triplestore, Fedora 4 = Repository and JMS event listener/indexer) as follows:
Follow the instructions above to install the triplestore (Fuseki or Sesame) in one machine = and start it.
Follow the instructions above to install two or more Fedora 4 Repositories in different mac= hines and start them.
Install JMS event listener/indexer (https://github.com/fcrepo4/fcrepo-message-consumer) for each Fed= ora 4 repository installation and start the indexer with the following comm= and:
$ mvn -= D jetty.port=3D9999 -Dfuseki.host=3D<triplestore.host.name> -Dfcrepo.= host=3D<repository.host.name> jetty:run
Notes
To make a resource indexable in the triplestore, the resource n= eeds to include Indexable mixin type: http://fe= dora.info/definitions/v4/indexing#Indexable, which can be inserted t= hrough a SPARQL insert:
INSERT = {<> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:= //fedora.info/definitions/v4/indexing#Indexable> }.