Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Goals

The history of an item should be available, and it should be possible to cite a particular version of the item.

Requirements

  1.  Versioning is at the level of data files. 
  2. Data packages and data files have a relationship between their identifiers. Each version of a file/package is represented by a separate DOI. The "base" DOI points to the most recent version of the file/package, but a version number can be added to the base DOI to access previous versions. See the [[]] page for more details about the DOI format.

  3. When a new version of data file is deposited, a new identifier will be created.
  4. Adding or changing README files or metadata will not result in creation of a new identifier.
  5. All metadata changes are logged by writing a metadata snapshot to the filesystem this allows us to retain a record of the changes, even though they are not available as explicit versions (and not viewable in the UI). - If all we are doing is adding a new bitstream without changing the existing bitstreams, there is no need to force a version number change.
  6. When a new identifier is created for a data file, a new identifier is automatically created for the corresponding data package.
  7. Each version of the item includes metadata about who created the version and the date/time. (it is essentially a full copy of the item, with modifications)
  8. Only the most recent version of an item is available via the search interface.
  9. On the item page, there is a link to view previous/subsequent versions.
  10. By examining the metadata, it is possible to determine whether an item is the most recent version, the original version, or an intermediate version.
  11. Previous versions of bitstreams are retained. If something was once retrievable, it is always retrievable.
  12. Creation of a new version is initiated by the author. On the "submissions" page, users should see all of their archived submissions. Each archived submission should have a button to submit a new version of a data file - this button doesn't appear anywhere else
  13. Expose Versioning detail in DSpace API and Services (I.E. OAI, Bagit, Etc.)

Technical Details

Work Areas for Implementation of Version History

  • Database Modifications
  • Data Access Objects and Domain Model definition
  • Enhancing DSpace XMLUI Item Adapter to expose Version Details on Items
  • Item User Interface Changes to support Versioning
    • Details that should be presented in User Interface
    • Actions that should be possible on existing versions (compare, revert, delete, withdraw)
    • Actions that should be possible to generate a new version
    • Actions that should be possible to users (submitter, author, curator)

User Scenarios

General user actions that would generate a new Version of a new Dryad Item and their overal impact on creation of a new DSpace item.

Scenarios

Description

New Data Package Version

New Data File Version

Add Data File to Existing Data Package

Add a Data File Item to Dryad and add to existing Data Package

yes

yes

Delete Data File from Package

Remove a Data File from an existing Data Package

yes

yes

Replace Data File in existing Package

Remove an existing Data File (2) and add a new one (1)

yes

yes

Edit Metadata in Data File

Metadata edits do not produce new Items

no

no

Edit Metadata in Data Package

Metadata Edits do not produce new Items

no

no

Metadata Mapping

General Versioning of DSpace Items

Generic Versioning of DSpace Items will involve the alteration of existing handles for those versions in DSpace.

=

=Version 1

=Version 2

=Version 3

dc.identifier.uri

1234.5/1.1

1234.5/1.2

1234.5/1.3

dc.relation.isReplacedBy

1234.5/1.2

1234.5/1.3

dc.relation.replaces

 

1234.5/1.1

1234.5/1.2

While past recommendation are that identifiers that are assigned to items be opaque, we have two possible benefit from selecting a versioning schedme for the identifiers.

1. The original Item and the version currently being referred to are captured in the identifier.
1. Resolution of the most current version can be attained programmatically without having to navigate the version hierarchy

The goal behind this versioning approach is to capture the version of the Item while retaining its original version stream id. In the above case the DSpace Item Handle will be added the the Handle Manager and resolvable via CNRI.

  • Reference to the most current version of the Item: http://hdl.handle.net/1234.5/1
  • Reference to a specific version in the version history: http://hdl.handle.net/1234.5/1.1

Versioning of the Dryad Data Package and its Identifiers

Adopting the General Solution in dealing with Composite Dryad Data Packages with Data Files will employ the same approach, but with doi identifiers.

Versioning of the Data Package will utilize versioning capabilities already inside the DOI service used to mint DOI's (written by Kevin Clark)

 

Package Version 1

Package Version 2

Package Version 3

dc.identifier

doi:10.651/dryad.154.1

doi:10.561/dryad.154.2

doi:10.561/dryad.154.3

dc.relation.isReplacedBy

doi:10.561/dryad.154.2

doi:10.561/dryad.154.3

dc.relation.replaces

 

doi.10.561/dryad.154.1

doi:10.561/dryad.154.2

The solution will also apply to individual Data Files when it is designated that the Data File is being replaced in the Data Package rather than being simply removed.

 

File Version 1

File Version 2

File Version 3

dc.identifier

doi:10.651/dryad.154.1/1.1

doi:10.561/dryad.154.2/1.2

doi:10.561/dryad.154.3/1.3

dc.relation.isReplacedBy

doi:10.561/dryad.154.2/1.2

doi:10.561/dryad.154.3/1.3

dc.relation.replaces

 

doi.10.561/dryad.154.1/1.1

doi:10.561/dryad.154.2/1.2

DSpace Data Model and Versioning

Past Architectural Review Group work on the DSpace Data Model and versioning focused on replicating the item contents when a new DSpace Item was created in DSpace This means that all item metdata and content would be replicated. This was done to ease the complexity of managing references across individual Items to the content and metadata that would be considered the same across those versions.

In the GSoC project to address versioning, we made an effort to optimize on the above situation and emplyed the following strategy for the production of a New version of a DSpace Item

DSpace Item Objects:

DSpace Bundle Objects:

For every new version of an Item A Bundle will be created and the Bundle will link to all the preexisting bitstreams for the original Item. This means that Bitstreams may be associated with more than one Item by being linked by Bundle. This may give rise to unexpected behavior in some of the DSpaceObject Code that retrieves the Parent Item from the Bitstream

DSpace Bitstream Objects:

Business Model

Versioning Service

Identifier Service

Storage Considerations

  • Should Versions and VersionHistories maybe be stored in separate tables from Items considered part of repository?
  • Version and VersionHistory may not be stored in Metadata, but instead calculated and added as additional metadata in the item?
  • Will previous versions of items be represented separately from the latest version in the Item table? This will be required for methods based on finding all items to be indexed, batch exported, updated by MediaFilter, ...
    • Using a separate column: latestVersion?
    • Using the in_archive column?

Administrative Interface possibilities

Versioning edits to an Item

With a full feature versioning capability on Items, we may possibly be able to support restoration of Item Versions through an interface such as this WIKI page example interface.

Versioning an individual Bitstream on an Item

Versioning individual Bitstreams can be comparable to versioning attachements to a wiki.

Further Scenarios

  • Deleting an Item from the Repository
  • Moving and Item Between Collections
  • Mapping an Item to another collection

Previous WIKI Page needing integration into above...

Versioning Services will layer on top of planned Resolver and Identifier Minting services to provide a layering of functionality where organizations can alter the versioning behavior and introduce their own enhancements:

Versioning Interaction with existing DSpace Systems

Start a new version

  • Versioning a new Item will be anOption on the "Context Menu".
  • Action will Create a New Item and place it into the Submission Workflow.
  • In Navigation.java for Item Versioning we will introduce the following new  
    context.addItem().addXref(contextPath+"/admin/newversion?itemID="+item.getID(), T_context_version_item);

Call to create new Item will be issued to the VersionService as.

ItemVersioningService vs = new DSpace().getServiceManager().getServiceByName(null, ItemVersioningService.class);

Item newVersion = vs.createNewVersion(item);

Call will be initiated from the JAvascript Administrative Controller. ( We may need to come up woth strategy for implementation of calls into the ServiceManager from the Controller.)

/**
Start versioning a new item.
*/
function startVersionItem()
{
var itemID = cocoon.request.get("itemID");
// verify we can create a new version
assertVersionItem(itemID);

// creates a new versioin in the submission workflow.
var newItemID = new DSpace().getServiceManager()....doVersionItem(itemID);

//restart editing new item as if it were part of the submission workflow.
var newItem = Item.find(getDSContext(),itemID);

cocoon.redirectTo(cocoon.request.getContextPath()+"/submit/"+newItem.getHandle(),true);
getDSContext().complete();
item = null;
cocoon.exit();
}

Code approach in GSoC Versioning project

from ArchiveManager (Possible methods for VersioningService)

/**
     * Gets an Item by its OriginalItemID and Revision numbers
     */
    public static DSpaceObject getVersionedItem(Context context, int originalItemID, int revision)
    {

        return ItemDAOFactory.getInstance(context).getByOriginalItemIDAndRevision(originalItemID, revision);
    }
    
    /**
     * Gets the HEAD of an OriginalItemID
     */
    public static DSpaceObject getHeadRevision(Context context, int originalItemID)
    {

        return ItemDAOFactory.getInstance(context).getHeadRevision(originalItemID);
    }
    
    /**
     * Creates a Item in the database that maintains all the same
     * attributes and metadata as the Item it supplants with a new
     * revision number and a link to the given Item as the previousRevision
     * a new bitstream is not created
     *
     * This Item is ready to be put into the Workspace or a Workflow
     *
     * @param item The Item to create a new version of
     */
    public static Item newVersionOfItem(Context context, Item originalItem)
    {
        try
        {
            ArchiveManager am = new ArchiveManager();
            ItemDAO itemDAO = ItemDAOFactory.getInstance(context);
            WorkspaceItemDAO wsiDAO = WorkspaceItemDAOFactory.getInstance(context);
            Item item = itemDAO.create();
            Item head = itemDAO.getHeadRevision(originalItem.getOriginalItemID());

            item.setArchived(false);
            item.setWithdrawn(originalItem.isWithdrawn());
            // Done by ItemDAO.update ... item.setLastModified();

            item.setOriginalItemID(originalItem.getOriginalItemID());

            item.setRevision(head.getRevision()+1);
            item.setPreviousItemID(head.getID());
            //System.out.println("Head: " + head.toString());

            item.setOwningCollectionId(originalItem.getOwningCollection().getID());
            item.setSubmitter(originalItem.getSubmitter().getID());

            item.setMetadata(originalItem.getMetadata());
            // Add uri as identifier.uri DC value
            item.clearMetadata("dc", "identifier", "uri", null);
            

            for (Bundle bundle : originalItem.getBundles())
            {
                item.addBundle(am.dupeBundle(context, bundle));
            }

            itemDAO.update(item);
            wsiDAO.create(item);
            return item;
        }
        catch (Exception e)
        {
            throw new RuntimeException(e);
        }
    }

/**
     *  Takes in a bundle and makes a deep copy of it.
     *  Without duping bitstreams.
     *
     *  @param bundle
     */
    private Bundle dupeBundle (Context context, Bundle bundle)
    throws AuthorizeException
    {
        BundleDAO bdao = BundleDAOFactory.getInstance(context);
        Bundle dupe = bdao.create();
        Bitstream[] bitstreams = null;
        int primary = bundle.getPrimaryBitstreamID();

        bitstreams = bundle.getBitstreams();
        for (Bitstream b : bitstreams)
        {
            dupe.addBitstream(b);
            if (primary == b.getID())
            {
                dupe.setPrimaryBitstreamID(b.getID());
            }
        }

        dupe.setName(bundle.getName());
        return dupe;
    }
}

Will Create a New Item and Place it into the Submission Workflow

  • Separate New Versions of an Item May be started
  • Can only one new version be started, until it has been finalized?
  • Should the new version of the data package, data files, and bitstreams be processed in the submission and/or reviewing workflow?
  • Should information about the revision be hidden until approved?
  • Should the handle of a replaced item automatically point to the latest version?

Versioning of item metadata

The metadata for either a datapackage or a data file can be altered.

  • Should all items receive a new version number at once?

Versioning of files

  • If no files are altered, preferable reference to same bitstream without duplication
  • If a new file is being uploaded, or a file is replaced by a URL, the new data file will no longer reference to the file
  • Should URL's for files remain the same if the file didn't change?

Two User Stories

Simple Item Versioning Case

The first example is most basic and involves providing a means to request a new version of an Item. When request in the API

public interface VersioningService
{
    public Version<T> getPropertyAsType(T original);
}

Will be used by the application in the following manner:

Item item = ....

String dsoId = "dso:item/" + item.getId();

Verison<Item> itemVersion = new DSpace().getServiceManager().getService(VersioningService.class).createNewVersion(item);

In the simplest Usecase, a new Version of an item will be created, it will have the following characteristics:

New version

Field

Value (Relationship)

dc.identifer.uri

<new-handle>

dc.relation.isVersionOf

<original-handle>

dc.relation.replaces

<previous item> (identifier of previous version)

Previous Version

 

 

dc.identifer.uri

<previous-handle>

dc.relation.isVersionOf

<original-handle>

dc.relation.isreplacedby

<new item> (Identifer of new Item)

We need to consider that the criteria for what constitues a verion of an Item will evolve with the feature and usage. But we have a basic agreement that usage of fields such as the following will be critical for versioning.

Where isReplacedBy and Replaces will link individual nodes in the version history while isVersionOf should be sufficient to identify an "original version history thread" for an Item.

The Versioning System should repurpose the identifier generated as values by the MintingService and ResolverService. This will allow the underlying identifer types to be changed out as desired while encapsulating the versioning logic cleanly wihtin the VersioningService.

Complex Item Case (Dryad Composite DataPackage/DataFile versioing)

TODO (feel free to expand on)

See For Further detail: [[]]

To clarify a little on the last point. We will probably be adjusting the handleManager to assure that we may have a handle specific tot he most current item that can be assigned separately from the current versions handle.

The current logic in reassigning a handle is the following:

1.) The item associated with the handle in the HandleManager is changed.
2.) The item metadata in dc.identifier.uri is updated

If we generate a separate version handle that always points at the most current, it would reside in a separate metadata field (dc.relation.isVersionOf) and would not be altered across versions.

We may consider using dc.relation.replaces / isReplacedBy for pointing backwards/forwards in the version history. If we do not use dc.relation.isReplaced by and just usedc.relation.replaces, we can avoid altering the original metadata record. But there is still some question in my mind to the importance of flagging that the current item has been "replaced"

as we discussed last Friday

Both DOI and Handle usecases require that a persistent id be created that represents the latest version of the Item. Ideally, this would be both calculated and serialized int he metadata in a manner to reduce having to update previous items when new items are added. this means that a VersionHistory identifier may not actually be "unique" in our metadata, but that calculating which version to return would always return the most recent.

For Item doi:10.651/dryad.154

For the next version (doi:10.651/dryad.154.2)
dc.identifier: doi:10.651/dryad.154.2
dc.relation.isVersionOf: doi:10.651/dryad.154 <--- should be present in all Items in VersionHistory, will be used to look up the entire version history
dc.relation.replaces: doi:10.651/dryad.154.1 <--- Will be used to trace the Revision Tree.

Where the previous version would have (doi:10.651/dryad.154.1)
dc.identifier: doi:10.651/dryad.154.1
dc.relation.isVersionOf: doi:10.651/dryad.154 <--- should be present in all Items in VersionHistory, will be used to look up the entire version history
dc.relation.isReplacedBy: doi:10.651/dryad.154.2--

IN the Handle approach for this will look like

For the next version (hdl:1234.5/3)
dc.identifier: hdl:1234.5/2
dc.relation.isVersionOf: hdl:1234.5/1 <--- should be present in all Items in VersionHistory, will be used to look up the entire version history
dc.relation.hasVersion: hdl:1234.5/2
dc.relation.replaces: hdl:1234.5/2 <--- Will be used to trace the Revision Tree.

Where the previous version (hdl:1234.5/2) would have
dc.identifier: hdl:1234.5/2
dc.relation.isVersionOf: hdl:1234.5/1 <--- should be present in all Items in VersionHistory, will be used to look up the entire version history
dc.relation.isReplacedBy: hdl:1234.5/3--

At this time, the HandleManager simply assigns a handle to an Item in DSpace, the adjustment that will need to be made is that whenever a new version is generated, the handle representing the version stream (hdl:1234.5/1) will need to be moved to the new Item, however, because it by definition is the identifier for the VersionHistory, and will always resolve the most current version, no, item metadata will need to be updated to reflect that change. This handle should be used when citing the most current version of the item.

If we were to use dc.relation.replaces and dc.relation.isVersionOf to identify both the VersionHistory and and not dc.relation.isReplacedBy to resolve

Tasks needing to be completed To support Versioning of Items:

Database Schema

Task: Create Database changes to support version history/version, similar to Workspaceitem / workflowitem

Task: Document Database Changes here:

Domain Model and Data Access Support Changes

Task: Create Data access objects and domain model to get versions from database

Task: Document Domain Model Here

Task: Utilize ServiceManager to deliver Versioning Service to Application,

Task: Provide Examples of how Service Manager is used to get Service here

XMLUI Support

Task: We will create a new Versioning Aspect Project for changes to XMLUI

Extensible ItemAdapter

Task: Extend the ItemAdapter to have a "administrativeMetadata" section, it would be good if this was pluggable.

To get to here we need review the ItemAdapter overrides that are available in atmire-xmlui-api...

https://atmire.com/svn/modules/dspace-atmire-xmlui/trunk/dspace-atmire-xmlui-api/src/main/java/org/dspace/app/xmlui/objectmanager/

Express Administrative Versioning Metadata in METS Administrative Section of

http://host/xmlui/metadata/handle/1234.5/102/mets.xml

<METS:METS .... ID="1234.5/102">

<METS:amdSec ID="RELS-EXT" STATUS="A">
    <METS:techMD ID="VERSION-HISTORY-0.1">
        <METS:mdWrap LABEL="DSpace Version Metadata" MDTYPE="OTHER"
                     FORMAT_URI="info:dspace/foobar"
                     MIMETYPE="application/rdf+xml" OTHERMDTYPE="UNSPECIFIED">
            <METS:xmlData>
                <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
                         xmlns:terms="http://purl.org/dc/terms"
                         xmlns:myns="http://www.nsdl.org/ontologies/relationships#"
                         xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

                    <rdf:Description rdf:about="hdl:1234.5/100">
                        <dc:identifier>1.1</dc:identifier>
                        <dc:description>Summary of New Version</dc:description>
                        <dc:creator>mdiggory@gmail.com</creator>
                        <terms:created>2011-01-01::12:12:12Z...</terms:created>
                        <terms:isReplacedBy rdf:resource="hdl:1234.5/101"/>
                    </rdf:Description>

                    <rdf:Description rdf:about="hdl:1234.5/101">
                        <dc:description>Summary of New Version</dc:description>
                        <dc:creator>mdiggory@gmail.com</creator>
                        <terms:created>2011-01-01::12:12:12Z...</terms:created>
                        <dc:identifier>1.2</dc:identifier>
                        <terms:replaces rdf:resource="hdl:1234.5/100"/>
                        <terms:isReplacedBy rdf:resource="hdl:1234.5/102"/>
                    </rdf:Description>

                    <rdf:Description rdf:about="hdl:1234.5/102">
                        <dc:description>Summary of New Version</dc:description>
                        <dc:identifier>1.3</dc:identifier>
                        <dc:creator>mdiggory@gmail.com</creator>
                        <terms:created>2011-01-01::12:12:12Z...</terms:created>
                        <terms:replaces rdf:resource="hdl:1234.5/101"/>
                        <terms:isReplacedBy rdf:resource="hdl:1234.5/103"/>
                    </rdf:Description>

                    <rdf:Description rdf:about="hdl:1234.5/103">
                        <dc:identifier>1.4</dc:identifier>
                        <dc:description>Summary of New Version</dc:description>
                        <dc:creator>mdiggory@gmail.com</creator>
                        <terms:created>2011-01-01::12:12:12Z...</terms:created>
                        <terms:replaces rdf:resource="hdl:1234.5/102"/>
                    </rdf:Description>

                </rdf:RDF>

            </METS:xmlData>
        </METS:mdWrap>
    </METS:techMD>
</METS:amdSec>

Utilize DC and DC TERMS where appropriate for time being.

http://dublincore.org/documents/2008/01/14/dcmi-terms/#terms-isVersionOfhttp://dublincore.org/documents/2008/01/14/dcmi-terms/#terms-hasVersionhttp://dublincore.org/documents/2008/01/14/dcmi-terms/#terms-isReplacedBy

we will decide on more fields, document in the wiki page.

Rendering to HTML

Task: Add Versioning DIV section to ItemView rendering the RDF admMeta section in the mets doc.

Actions in XMLUI

Task: Add Administrative Option in Item Edit View to create a new versioin

Task: New Version should go into WorkspaceItem table of whomever created it and be opened in the Submission workflow for the user to edit metadata and bitstreams.

Reviewer Workflow Step

Identify in REviewer Workflow Step how to address the new Item.

Curator Workflow Step

Identify on Curator Workflow Step will render that Item is an New Version.

Add To Archive

TBD, Should replacement of the previous version in the archive be determined by user? Examples:

  1. Withdraw Previous Item From Search and View
  2. Withdraw Previous Item For Search Only
  • No labels