Summary


(Conceptual thing) Extract Transform and Load Metamodel


Approach for mapping between different models of DSpace and VIVO 

First approach: using the Dspace RDFizer

Ref: Linked (Open) Data 

The first approach is to use the RDFIzer which is currently provided by Dspace

  • The transformation process we propose, which has been tested several times consists of:
    • first, translating the DSpace source data into an RDF graph representation. This task to be performed by the DSpace RDFizer
    • Second, translate into VIVO perspective the DSpace source data within an RDF format using a SPARQL construct query
  • The SPARQL query uses the following files as input :
    • DSpace source data in RDF format
    • the files representing the DSpace data semantics which are in RDF
    • the VIVO ontology used to contextualize the data sources in the VIVO perspective

Second approach: Build a VIVO exporter

  • This approach consists in producing a VIVO representation of DSpace data from a source data extraction mechanism directly from the Postgres database used by DSpace.
  • The block 1 presents the semantization process of the Dspace data structure. The process consists in extracting the database schema and storing it in an XML or CSV file. The translation of this XML file into RDF is produced by the XML2JSON and JSON2RDF utilities. The result of this process is an RDF ontology representing the structure of the source database tables. This process does not involve any particular risk since we have implemented this type of processing in many of our projects.
  • Block 2 which has been processed similarly to block one is mainly oriented to extract the data contained in the database.  The result is therefore a graph of data in RDF format that are structured in the Dspace perspective.
  • Finally, block 3 consists in performing a mapping between the semantized data from the Dspace perspective to a semantized data from the VIVO perspective. As for the previous approach, this mapping is performed by a SPARQL construct query.
  • The process of the third approach is similar to the second approach except that the data source comes from the Dspace REST-API instead of from the Postgres database.

Third approach: Using Model Driven Development based on OpenApi/Swagger specification


Software communication between VIVO and DSpace

Architectural solution 1 : DSpace/VIVO facade A single, standardized access point between VIVO and Dspace.

DSpace/LOD Document: https://wiki.lyrasis.org/display/DSDOC7x/Linked+%28Open%29+Data

  • Facade is a structural design pattern that provides an interface for easy access to a library, framework or any complex set of classes.
  • RDFizer is an on-demand translator of DSpace data into a set of RDF triples that are stored in a triplestore accessible through a SPARQL endpoint.
  • The DSpace/VIVO facade is accessible to a web client and provides a single-entry point that unifies communications between the various components of the ecosystem
  • The SPARQL federated search allows to unify the result of a search even though it is distributed over the two data sources Fuseki and VIVO
  • The data synchronization periodicity is delegated to an external service of the facade which is in fact a kind of Web client

Architectural solution 2 : Add semantic web functionality to DSpace

  1. Extend the storage layer by adding an RDF triplet database (TDB) whose contents are synchronized in real time with the system's metadatabase
  2. Add a SPARQL query editor and SPARQL endpoint Api to the application layer
  3. Data synchronization between VIVO and DSpace is ensured by the facade through the SPARQL protocol
  4. The DSpace semantic instance can thus become a LOD node

Architectural solution 3 : Messaging patterns

  1. The main objective of the Message Design Pattern is to decouple the software from its external interfaces.
  2. This pattern allows iterative interface development while maintaining backward compatibility.
  3. The message is an exchange of information between a sender and one or many receivers. The message management is provided by the messaging system
  4. In the message flow example, DSpace is the sender of the message and the receivers are VIVO and the other sources that are connected to the Messaging system.
  5. Data from different sources are synchronized in real time
  6. The DSpace/VIVO facade allows federated search execution using SPARQL query
  7. Data can also be accessed by a client directly from the messaging system
  8. The current VIVO-DataConnect project uses this pattern. It is specially designed to standardize the integration of external data sources such as Orchid

In summary: Architecture comparison table

  • No labels