JIRA Reference: https://jira.duraspace.org/browse/DS-638
Proposed: "This patch uses JHOVE to provide rough-and-ready format checking by identifying that the file/bitstream extension matches formats verifiable by JHOVE. (Currently DSpace accepts a deposit's file extension as gospel, so a user could tack a ".txt" extension onto a GIF and DSpace would assign the incorrect format to the file based on that incorrect extension.) This patch also also contains code to check the file for the presence of viruses."
DCAT review: This patch seems to be doing two things: a) integrate with JHOVE, something that would be of strong interest for any repository that aim to preserve its contents in the long term and b) use virus checking tools (based on ClamAv) as part of that process. One can imagine that it would be interesting to have the virus tools without using the JHOVE package, so it may be worth exploring separating these? It may well be that it would be useful to encourage a community discussion about what tools would be useful, now that the curation framework is there (although I haven't had the time to check what it actually does).
The ticket is already assigned to Richard Rodgers as he will need to assess how this would work with the new curation framework that came with 1.7.
DCAT initial assessment: Relevant; Medium-High or High priority
Next steps: Initially it would be useful to check with Richard Rodgers what his take on this is (which I'm happy to do). Also, as Jim is also from Michigan, and the proposal seems to originate there, he may be able to provide more detail?
- If you agree with the above assessment and have no additional comments, you can simply respond with a +1.
- If you disagree but have no comments, a -1 works, and if you have no opinion at all, 0 is fine. (And encouraged, since that means we know you've had a chance to weigh in.)
- If you do have comments or other ideas, you're not limited to the numbers, of course. So please do share your thoughts!
7 Comments
Elin Stangeland
+1
Jim Ottaviani
+1 (though I have an obvious bias and vested interest regarding this one, since we developed it here)
Iryna Kuchma
+1
Sarah Shreeves
+1
Robin Taylor
+1 I dont want to sidetrack this conversation but I would like to propose wider changes. I would like to see the file upload moved to be the first step in the submission process. I would like to see its file type verified as described above and I would like to see it checked for viruses. Then, depending on the file and package types, we could extract metadata from the uploaded file and prepopulate the item metadata. There are a number of package types to which this would apply eg Mets, SCORM, but perhaps the most interesting are the docx zip files that can now be output from Microsoft Word using the Author plugin.
Most of the functionality to do this is already available in DSpace, it just needs pulled together. I would happily volunteer to undertake the work for 1.8 as its in line with the requirements of my institution.
Tim Donohue
Just a note to mention that the Developers discussed this DCAT review on Feb 23, 2011. Elin Stangeland was also in attendance
Full discussion thread is available in our IRC logs for that day: http://irclogs.duraspace.org/index.php?date=2011-02-23
Here's a brief summary of what we came up with:
Elin Stangeland
To conclude on this issue - the virus part of this patch has been picked up and amended slightly to allow virus checks on deposit in addition to workflow. It should be included in DSpace 1.8 (Robin Taylor is responsible for this release).
The file format verification will be taken forward as well, I'm beeing told that there are work going on using Droid initially within the curation framework (if this is not the case then this patch will be brought forward instead. Jhove is more extensive than Droid (in fact it utilises Droid for some tasks) so is still worth considering in the longer term. I'll keep an eye on things, also on what is going on with the Jhove2 developments, but hope that everyone is ok with me (and Robin) closing this particular discussion.