Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: document the new script to detect duplicates among cris objects

...

Info
Since DSpace-CRIS 5.10 basic deduplication features have been implemented also for CRIS objects to identify and merge potential duplicates among researcher profiles, projects, organisations, etc.

the detection mechanism for CRIS Objects is the same illustrated above for DSpace items. Out of box is possible to configure which metadata are used to identify duplicate among each object types. Custom signature algorithm can be implemented and activeted via Spring bean in the same exact way than for publications (dspace items)

Manage potential CRIS duplicate

A batch script is provided to manage potential duplicates among CRIS Objects. 

Code Block
usage: org.dspace.app.cris.batch.ScriptListAndRejectDedupObjects
       
 -c,--compare      compare two objects
 -h,--help         help
 -i,--id <arg>     object id
 -n,--note <arg>   reject note
 -r,--reject       reject two objects
 -t,--type <arg>   object type


USAGE:
 List duplicates: -t <object type> [-i <object id>]
 Compare two objects: -c -t <object type> -i <first object id> <second object id>
 Reject two duplicate objects: -r -t <object type> -i <first object id> <second object id> [-n <reject note>]

So to list all the groups of potential duplicates for researcher profiles you need to execute

Code Block
./dspace dsrun org.dspace.app.cris.batch.ScriptListAndRejectDedupObjects -t 9

using -t 10 you will get the list of potential duplicates among projects and with -t 11 among organisations. It is also possible to list potential duplicates of additional dynamic entities like journals, awards, etc. once the the dynamic object type is knonw (i.e. 11, 12, ...)

To flag a potential duplicate as a fake detection you need to run the script specifying the type of the objects (9 for researcher profiles, etc.) and the id of the two objects.

Warning

Please note that, contrary to what happen for rejection of duplicate suggestion among dspace items, the rejection is only stored in the deduplication solr core. So if you rebuild the deduplication core using the org.dspace.app.cris.batch.DedupClient script you can potentially loss such information.

The org.dspace.app.cris.batch.DedupClient script has been extended to support the -t parameter as well so to allow reindexing of specific object types

Merge Script

A batch script is provided to merge different instances of a cris object in a single one. The script works on any kind of entity (researcher profiles, organisation units, projects, etc.) with the following rules

...