Date: Thu, 28 Mar 2024 22:38:17 -0400 (EDT) Message-ID: <1652103282.29600.1711679897967@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_29599_1794766289.1711679897967" ------=_Part_29599_1794766289.1711679897967 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The out-of-the box functions will allow you to support all of ou= r current solution packs, and the MODS and DC metadata streams associated w= ith them. Once you have GSearch installed and running there is very little = you need to do. However, you may wish to customize Solr to index a ne= w metadata schema (if you are creating a custom content model) or if you wa= nt to modify existing fields. To do this, you would modify the foxmlToSolr.= xslt located in the GSearch webapps directory. If you followed the in= structions for installing GSearch in Cha= pter 9 - Enabling Indexing & Searching with Solr, the file would be= located here:
/usr/lo= cal/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearc= h_solr
For example, to add the Darwin Core to the index you can add the followi= ng lines to the xslt:
<= ;xsl:for-each select=3D"foxml:datastream/foxml:datastreamVersion\[last()\]/= foxml:xmlContent/dwc:SimpleDarwinRecordSet/dwc:SimpleDarwinRecord/*"> <xsl:if test=3D"text() \[normalize-space(.) \]"><\!\--don't bother= with empty space-\-> <field > <xsl:attribute name=3D"name"> <xsl:value-of select=3D"concat('dwc.', substring-after(name(),':'))"/>= ; </xsl:attribute> <xsl:value-of select=3D"normalize-space(text())"/> </field> </xsl:if> </xsl:for-each>
The xsl above will index most Darwin Core fields. Once GSearch is aware = of the new schema, you can make Solr aware of it by modifying the schema.xm= l. Quite often, it makes sense to assign the same content to two fiel= ds:
The xsl above will create many fields one of which would be dwc.language= . In the Solr Schema we would add a declaration for this field:
<= ;field name=3D"dwc.language" type=3D"text" indexed=3D"true" stored=3D"true"= multiValued=3D"true"/>
Here, we have given it a type =3D =E2=80=9Ctext=E2=80=9D, which in the d= efault schema is analyzed.
<= ;fieldType name=3D"text" class=3D"solr.TextField" positionIncrementGap=3D"1= 00"><analyzer type=3D"index"><tokenizer class=3D"solr.Whitespac= eTokenizerFactory"/><!-- in this example, we will only use synonyms a= t query time <filter class=3D"solr.SynonymFilterFactory" synonyms=3D"index_sy= nonyms.txt" ignoreCase=3D"true" expand=3D"false"/> --><filter class=3D"solr.StopFilterFactory" ignoreCase=3D"tru= e" words=3D"stopwords.txt"/><filter class=3D"solr.WordDelimiterFilter= Factory" generateWordParts=3D"1" generateNumberParts=3D"1" catenateWords=3D= "1" catenateNumbers=3D"1" catenateAll=3D"0"/><filter class=3D"solr.Lo= werCaseFilterFactory"/><filter class=3D"solr.EnglishPorterFilterFacto= ry" protected=3D"protwords.txt"/><filter class=3D"solr.RemoveDuplicat= esTokenFilterFactory"/></analyzer><analyzer type=3D"query">&= lt;tokenizer class=3D"solr.WhitespaceTokenizerFactory"/><filter class= =3D"solr.SynonymFilterFactory" synonyms=3D"synonyms.txt" ignoreCase=3D"true= " expand=3D"true"/><filter class=3D"solr.StopFilterFactory" ignoreCas= e=3D"true" words=3D"stopwords.txt"/><filter class=3D"solr.WordDelimit= erFilterFactory" generateWordParts=3D"1" generateNumberParts=3D"1" catenate= Words=3D"0" catenateNumbers=3D"0" catenateAll=3D"0"/><filter class=3D= "solr.LowerCaseFilterFactory"/><filter class=3D"solr.EnglishPorterFil= terFactory" protected=3D"protwords.txt"/><filter class=3D"solr.Remove= DuplicatesTokenFilterFactory"/></analyzer></fieldType>>= pre>
The types are also defined in the schema.xml. If we want to use th= is in a filter, it makes sense to also store it unanalyzed under a differen= t name. This requires two more entries in the schema.xml:
<= ;field name=3D"language" type=3D"string" maxChars=3D"300" indexed=3D"true" = stored=3D"true" multiValued=3D"true"/> <copyField source=3D"dwc.language" dest=3D"language"/>
Once we have created a field named language to store the unanalyzed data= in, we=E2=80=99ll use copyField to copy the dwc.language field into the la= nguage field which will happen during indexing before it is analyzed. Notic= e the type is now defined as a string. We can now use these fields in solr = request handlers. Request handlers determine what fields to search an= d what to return, and you can assign certain fields more weight than others= .
A request handler may look like this:
<= ;requestHandler name=3D"herbarium" class=3D"solr.SearchHandler" default=3D"= true"> <\!-\- default values for query parameters --> <lst name=3D"defaults"> <str name=3D"echoParams">explicit</str> <str name=3D"qf">dwc.type^2.0 dwc.language^2.0 dwc.rightsHolder^2.0 d= wc.accessRights^2.0 dwc.rights^2.0 dwc.basisOfRecord^2.0 dwc.scientificName= ^2.0 dwc.vernacularName^2.0 dwc.kingdom^2.0 dwc.phylum^2.0 dwc.class^2.0 dw= c.order^2.0 dwc.family^2.0 dwc.genus^2.0dwc.specificEpithet^2.0 dwc.contine= nt^2.0 dwc.country^2.0 dwc.countryCode^2.0 dwc.stateProvince^2.0 dwc.county= ^2.0 dwc.municipality^2.0 dwc.verbatimLocality^2.0 dwc.decimalLatitude^2.0 = dwc.decimalLongitude^2.0 dwc.occurrenceID^2.0 dwc.institutionCode^2.0 dwc.c= ollectionCode^2.0 dwc.catalogNumber^2.0 dwc.recordedBy^2.0 dwc.eventDate^2.= 0 PID^0.5</str> <str name=3D"fl">rightsHolder, accessRights, rights,basisOfRecord, sc= ientificName, vernacularName, kingdom, phylum, class, order, family, genus,= specificEpithet, continent, country,countryCode,stateProvince, county, mun= icipality, verbatimLocality, decimalLatitude, decimalLongitude, occurrenceI= D, institutionCode, collectionCode, catalogNumber, recordedByeventDate, PID= </str> <str name=3D"q.alt">*:*</str> </lst> <lst name=3D"appends"> <str name=3D"fq">PID:herbarium*</str> </lst> </requestHandler>
Some interesting things to take note of:
The request handler example shown above limits the results to objects th= at have the herbarium namespace.
In the qf , we are searching fields like dwc.type and dwc.language and t= hey are all weighted the same. We can tweak the weights later if we wish to= customize the results. Solr returns are the fields in <str name=3D=E2= =80=9Dfl=E2=80=9D> element. This gives us nice values to use when = displaying the results and when listing facets.