Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Realtime import of bibliographic records

Other than the provider providers already available in a standard DSpace installation such as ArXiv, PubMed, Cinii, CrossRef and generic OAI-PMH providers, DSpace-CRIS add to the Submission Lookup Step the ability to query PubMed Europe, Scopus, SciVal and/or Web of Science.

...

pubmedEuropeXML enables the use of the XML format specific of PubMed Europe.

The metadata mapping is defined in the bean

Code Block
languagexml
themeEclipse
<bean name="pubmedEuropeInputMap" class="java.util.HashMap" scope="prototype">
	<constructor-arg>
		<map key-type="java.lang.String" value-type="java.lang.String">
			<entry key="pmid" value="pubmedID" />
			<entry key="pmcid" value="pubmedcentralID" />
			<entry key="doi" value="doi" />
			<entry key="ISSN" value="jissn" />
			<entry key="EISSN" value="jeissn" />
			<entry key="jTitle" value="journal" />
			<entry key="startPage" value="firstpage" />
			<entry key="endPage" value="lastpage" />
			<entry key="title" value="title" />
			<entry key="pubDate" value="issued" />
			<entry key="volume" value="volume" />
			<entry key="issue" value="issue" />
			<entry key="language" value="language" />				
			<entry key="pubType" value="subtype" />
			<entry key="keyword" value="keywords" />
			<entry key="primaryMeshHeading" value="meshheadings" />
			<entry key="secondaryMeshHeading" value="meshqualifiers" />
			<entry key="abstractText" value="abstract" />
			<entry key="author" value="authors" />
			<entry key="investigator" value="investigators" />
			<entry key="publisher" value="publisher" />
			<entry key="series" value="seriestitle" />
			<entry key="bookTitle" value="booktitle" />
			<entry key="isbn" value="pisbn" />
			<entry key="sISSN" value="sissn" />
			<entry key="edition" value="editionnumber" />
			<entry key="url" value="url" />
			<entry key="uri" value="uri" />
		</map>
	</constructor-arg>
</bean>

Scopus

The class org.dspace.submit.lookup.ScopusOnlineDataLoader is the implementation of the submission lookup interface that enable the integration with the Scopus API.

...

Code Block
languagexml
themeEclipse
<bean name="multipleDataLoader" class="org.dspace.submit.lookup.MultipleSubmissionLookupDataLoader" scope="prototype">
	    <property name="dataloadersMap">
	        <map>
					...
				<!-- <entry key="scopus" value-ref="scopusOnlineDataLoader"/> -->
				...				

the mapping is defined in the bean

Code Block
languagexml
themeEclipse
<bean name="scopusInputMap" class="java.util.HashMap" scope="prototype">
	<constructor-arg>
		<map key-type="java.lang.String" value-type="java.lang.String">
			<entry key="url" value="url" />
			<entry key="eid" value="eid" />
			<entry key="doi" value="doi" />
			<entry key="pmid" value="pubmedID" />
			<entry key="title" value="title" />
			<entry key="itemType" value="subtype" />
			<entry key="scopusType" value="providerType" />
			<entry key="sourceTitle" value="journal" />
			<entry key="isbn" value="pisbn" />
			<entry key="issn" value="jissn" />
			<entry key="eissn" value="eissn" />
			<entry key="issued" value="issued" />
			<entry key="volume" value="volume" />
			<entry key="issue" value="issue" />
			<entry key="spage" value="firstpage" />
			<entry key="epage" value="lastpage" />
			<entry key="description" value="abstract" />
			<entry key="scopusKeywords" value="keywords" />
			<entry key="articlenumber" value="articlenumber" />
			<entry key="authors" value="authors" />
			<entry key="authorUrl" value="authorUrl" />
			<entry key="authorScopusid" value="authorScopusID" />
			<entry key="orcid" value="orcid" />
		</map>
	</constructor-arg>
</bean>
Tip

The scopus online data provider exposes the ORCID, Scopus ID and Author Scopus URL for each authors, when not available for one or more authors the placeholder value #NODATA# is used. This allows the use of the metadata filler functionality to create from the publication richer author profiles. It also open to future development of custom BTE processor that can lookup to existent researcher profile using these IDs other than the name

 

SciVal

The class org.dspace.submit.lookup.SciValOnlineDataLoader is the implementation of the submission lookup interface that enable the integration with the SciVal API.

...

Code Block
languagexml
themeEclipse
<bean name="multipleDataLoader" class="org.dspace.submit.lookup.MultipleSubmissionLookupDataLoader" scope="prototype">
	    <property name="dataloadersMap">
	        <map>
					...
				<!-- <entry key="scopus" value-ref="scivalOnlineDataLoader"/> -->
				...				

...

the mapping is defined in the bean

Code Block
languagexml
themeEclipse
<bean name="scivalInputMap" class="java.util.HashMap" scope="prototype">
	<constructor-arg>
		<map key-type="java.lang.String" value-type="java.lang.String">
			<entry key="eid" value="eid" />
			<entry key="doi" value="doi" />
			<entry key="issn" value="jissn" />
			<entry key="eissn" value="jeissn" />
			<entry key="isbn" value="pisbn" />
			<entry key="journalTitle" value="journal" />
			<entry key="title" value="title" />
			<entry key="year" value="issued" />
			<entry key="volume" value="volume" />
			<entry key="issue" value="issue" />
			<entry key="edition" value="edition" />
			<entry key="startPage" value="firstpage" />
			<entry key="endPage" value="lastpage" />
			<entry key="authors" value="authors" />
			<entry key="chairs" value="chairs" />
			<entry key="affiliations" value="affiliations" />
			<entry key="articleNumber" value="articleNumber" />
			<entry key="authorsWithAffiliations" value="authorsWithAffiliations" />
			<entry key="displayUrl" value="scopusUrl" />
			<entry key="citationCount" value="scopusCitation" />
			<entry key="citationUrl" value="scopusCitationUrl" />
			<entry key="url" value="url" />
			<entry key="classificationASJC" value="classificationASJC" />
			<entry key="keywords" value="keywords" />
			<entry key="language" value="language" />
			<entry key="abstracts" value="abstract" />
			<entry key="abstractita" value="abstractita" />
			<entry key="abstracteng" value="abstracteng" />
			<entry key="abstractfre" value="abstractfre" />
			<entry key="abstractger" value="abstractger" />
			<entry key="abstractesp" value="abstractesp" />
			<!-- <entry key="issueDate" value="issued" /> -->
			<entry key="medium" value="medium" />
			<entry key="titleAlternative" value="titlealternative" />
			<entry key="issueTitle" value="issuetitle" />
			<entry key="conferenceName" value="conferenceName" />
			<entry key="conferenceNumber" value="conferenceNumber" />
			<entry key="conferencePlace" value="conferencePlace" />
			<entry key="conferenceYear" value="conferenceYear" />
			<entry key="conferenceSponsor" value="sponsor" />
			<entry key="conferenceTarget" value="conferencetarget" />
			<entry key="supplement" value="supplement" />
			<entry key="scpId" value="scopusid" />
			<entry key="medlineId" value="medlineid" />
			<entry key="bookTitle" value="booktitle" />
			<!-- <entry key="#sourceAuthor" value="" /> -->
			<!-- <entry key="#sourceTranslator" value="" /> -->
			<entry key="publisherName" value="publisher" />
			<entry key="publisherPlace" value="publisherPlace" />
			<entry key="publisherCountry" value="publisherCountry" />
			<entry key="internationalAuthor" value="internationalauthor" />
			<entry key="itemType" value="subtype" />
		</map>
	</constructor-arg>
</bean>

Web of Knowledge

The class org.dspace.submit.lookup.WOSOnlineDataLoader is the implementation of the submission lookup interface that enable the integration with the Web of Knowledge WokSearch API.

...

Code Block
languagexml
themeEclipse
<bean name="multipleDataLoader" class="org.dspace.submit.lookup.MultipleSubmissionLookupDataLoader" scope="prototype">
	    <property name="dataloadersMap">
	        <map>
					...
				<!-- <entry key="wos" value-ref="wosOnlineDataLoader"/> -->
				...				

the mapping is defined in the bean

Code Block
languagexml
themeEclipse
<bean name="wosInputMap" class="java.util.HashMap" scope="prototype">
	<constructor-arg>
		<map key-type="java.lang.String" value-type="java.lang.String">
			<entry key="isiId" value="isiId" />
			<entry key="doi" value="doi" />
			<entry key="issn" value="jissn" />
			<entry key="journalTitle" value="journal" />
			<entry key="title" value="title" />
			<entry key="year" value="issued" />
			<entry key="volume" value="volume" />
			<entry key="issue" value="issue" />
			<entry key="startPage" value="firstpage" />
			<entry key="endPage" value="lastpage" />
			<entry key="authors" value="authors" />
			<entry key="citationCount" value="wosCitation" />
			<entry key="keywords" value="keywords" />
			<entry key="language" value="language" />
			<entry key="abstracts" value="abstract" />
			<entry key="abstractita" value="abstractita" />
			<entry key="abstracteng" value="abstracteng" />
			<entry key="abstractfre" value="abstractfre" />
			<entry key="abstractger" value="abstractger" />
			<entry key="abstractesp" value="abstractesp" />
			<entry key="publisherName" value="publisher" />
			<entry key="publisherPlace" value="publisherPlace" />
			<entry key="publisherCountry" value="publisherCountry" />
			<entry key="itemType" value="subtype" />
			<entry key="wosType" value="providerType" />
		</map>
	</constructor-arg>
</bean>

Periodic scanning of the external database

...

Info
Currently, no special operations are performed by the retrieval scripts to guess a mapping between the publication's authors and the researcher profiles already defined in the system. 

the BTE corresponding data-on-line providers are used by all the scripts to convert the internal publication representational (scopus, wos, pubmed) to the internal DSpace metadata, this mean that the mapping is defined in the [dspace-installDir]/config/spring/bte.xml see above

PubMed Europe

The DSpace script to invoke is

Code Block
./dspace dsrun org.dspace.app.cris.batch.PMCEuropeFeed -p submitter -c collectionID [-q query] -p submitter [-s start_date(yyyy-mm-dd)] [-e end_date(yyyy-mm-dd) -c] collectionID [-t] [-m <metadata-for-pmid>] [-n <metadata-for-pmcid>]

-p the email address of the user that will be used to create / update items

-c the target collection for new items

-q the search query for pubmed. If not specified it is retrieved from the configuration file

-s the start date to consider for new / updated record in pubmed. By default the script will search for changes since the previous successful execution of the script or today when executed for the first time

-e the end date to consider (useful in conjuction with start_date to "recover" past records

-t the script is executed in DRY-RUN mode, the retrieved records are just displayed

-m specify the metadata used to store the pmid identifier, default dc.identifier.pmid

-n specify the metadata used to store the pmcid identifier, default dc.identifier.pmcid

The script uses the configuration file [dspace-installDir]/config/modules/pmceuropefeed.cfg to get default values for some of the previous properties when not specified from the command line and additional configuration properties like the service endpoint URL

Scopus

The DSpace script to invoke is

Code Block
./dspace dsrun org.dspace.app.cris.batch.ScopusFeed -q query -p submitter -s start_date(yyyy-mm-dd) -e end_date(yyyy-mm-dd) [-f] -c collectionID

...

-p the email address of the user that will be used to create / update items

-c the target collection for new items to use when a specific mapping is not defined in the configuration file

-f will force the script to use the specified collection (-c) for all the found items ignoring the mapping defined in the configuration file

-q the search query for pubmed. If not specified it is retrieved from the configuration file

-s the start date to consider for new / updated record in scopus. By default the script will search for changes from yesterday

-e the end date to consider (useful in conjunction with start_date to "recover" past records)

The script uses the configuration file [dspace-installDir]/config/modules/scopusfeed.cfg to get default values for some of the previous properties when not specified from the command line and additional configuration properties like the service endpoint URL and the mapping between Scopus publication types and Collections

Code Block
# Article
# scopus.type.Article.collectionid=1
# Abstract Report
# scopus.type.Abstract\ Report.collectionid=1
# Article in Press
# scopus.type.Article\ in\ Press.collectionid=1
# Book
# scopus.type.Book.collectionid=1
...

Web of Knowledge

The DSpace script to invoke is

Code Block
./dspace dsrun org.dspace.app.cris.batch.WosFeed -q query -p submitter -s start_date(yyyy-mm-dd) -e end_date(yyyy-mm-dd) [-f] -c collectionID

...

-p the email address of the user that will be used to create / update items

-c the target collection for new items to use when a specific mapping is not defined in the configuration file

-f will force the script to use the specified collection (-c) for all the found items ignoring the mapping defined in the configuration file

-q the search query for Web of Knowledge. If not specified it is retrieved from the configuration file

-s the start date to consider for new / updated record in web of knowledge. By default the script will search for changes from yesterday

-e the end date to consider (useful in conjunction with start_date to "recover" past records)

The script uses the configuration file [dspace-installDir]/config/modules/wosfeed.cfg to get default values for some of the previous properties when not specified from the command line and additional configuration properties like the service endpoint URL and the mapping between WoK publication types and Collections

Code Block
# wos.type.Article.collectionid=7
# wos.type.Abstract\ of\ Published\ Item.collectionid=7
# wos.type.Art\ Exhibit\ Review.collectionid=7
# wos.type.Bibliography.collectionid=7...

Retrieve of bibliometrics data (citation count)

PubMed Central

The system is able to query PubMed Central

...

 

Scopus

 

PMC to retrieve the list of citing publications for each publication in DSpace with a pmid. The functionality rely on the use of the meatadata dc.identifier.pmid to hold the pmid. An utility script is provided to enrich items that have a DOI or a PMCID with the pmid identifier.

The script is

Code Block
org.dspace.app.cris.metrics.pmc.script.RetrievePubMedID

it queries the pmc SORL core using the known identifiers (dc.identifier.doi and/or dc.identifier.pmcid) and add the resulting dc.identifier.pmid if found.

The pmc SOLR core is populated from a dump of the pmc database available for free as csv file at the following URL

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz

once downloaded and gunzipped the bash script

Code Block
[dspace-installDir]/bin/pubmed-central-retrieve

loads the CSV in the SOLR core for fast querying.

Info
The process should be performed periodically if you don't plan to collect the pmid in the submission

Once your dspace items (publications) have the dc.identifier.pmid correctly set you can use the bash script

Code Block
[dspace-installDir]/bin/pubmed-retrieve-citation-second

to invoke all the DSpace script needed to retrieve the PMC citation list, store the count as metrics (pubmed) of the dspace items and build the basic derivative metric such as percentile, variation over one week / month and aggregate the value to the researcher

Scopus

Code Block
[dspace-installDir]/bin/scopus-retrieve

The bash script will execute all the dspace script needed by the functionality to

  • retrieve the citation count from scopus (max 5000 publications for execution, ignoring publication with citation count new than 7 days)
  • count the number of publication in scopus (with a dc.identifier.eid)
  • aggregate the metrics to the Researcher level

The file [dspace-installDir]/config/modules/cris.cfg contains some relevant configurations

Code Block
ametrics.elsevier.scopus.enabled = ${cris.ametrics.elsevier.scopus.enabled} 
ametrics.elsevier.scopus.endpoint = ${cris.ametrics.elsevier.scopus.endpoint}
ametrics.elsevier.scopus.apikey = ${cris.ametrics.elsevier.scopus.apikey}
...
#scopus id
ametrics.identifier.eid = dc.identifier.scopus
ametrics.identifier.doi = dc.identifier.doi


Web of Knowledge

 

Code Block
[dspace-installDir]/bin/wos-retrieve

The bash script will execute all the dspace script needed by the functionality to

  • retrieve the citation count from web of knowledge (max 10000 publications for execution, ignoring publication with citation count new than 7 days)
  • count the number of publication in wok (with a dc.identifier.isi)
  • aggregate the metrics to the Researcher level

The file [dspace-installDir]/config/modules/cris.cfg contains some relevant configurations

Code Block
ametrics.thomsonreuters.wos.enabled = ${cris.ametrics.thomsonreuters.wos.enabled}
ametrics.thomsonreuters.wos.endpoint = ${cris.ametrics.thomsonreuters.wos.endpoint}
...
#wos id
ametrics.identifier.ut = dc.identifier.isi
Warning

By default, the system expects to be granted to use the WoK webservice by IP. If you need to authenticate with username / password you need to customize the [dspace-installDir]/config/crosswalks/wos-header.template file

 

...