All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
DSpace can apply filters or transformations to files/bitstreams, creating new content. Filters are included that extract text for full-text searching, and create thumbnails for items that contain images. The media filters are controlled by the dspace filter-media
script which traverses the asset store, invoking all configured MediaFilter
or FormatFilter
classes on files/bitstreams (see Configuring Media Filters for more information on how they are configured).
Below is a listing of all currently available Media Filters, and what they actually do:
...
Please note that the filter-media
script will automatically update the DSpace search index by default (see ReIndexing Content with the old legacy providers (DBMS for Browse or Lucene for Search)see Legacy methods for re-indexing content) This is the recommended way to run these scripts. But, should you wish to disable it, you can pass the -n flag to either script to do so (see Executing (via Command Line) below).
The media filter plugin configuration filter.plugins
in dspace.cfg
contains a list of all enabled media/format filter plugins (see Configuring Media Filters for more information). By modifying the value of filter.plugins
you can disable or enable MediaFilter plugins.
The media filter system is intended to be run from the command line (or regularly as a cron task):
...
[dspace]/bin/dspace filter-media -h
[dspace]/bin/dspace filter-media -f
[dspace]/bin/dspace filter-media -i 123456789/2
[dspace]/bin/dspace filter-media -m 1000
[dspace]/bin/dspace filter-media -n
index-update
elsewhere.[dspace]/bin/dspace filter-media -p "PDF Text Extractor","Word Text Extractor"
[dspace]/bin/dspace filter-media -s 123456789/9,123456789/100
[dspace]/bin/dspace filter-media -s `less filter-skiplist.txt`
[dspace]/bin/dspace filter-media -v
org.dspace.app.mediafilter.FormatFilter
interface. See the Creating a new Media/Format Filter topic and comments in the source file FormatFilter.java
for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.New Media Filters must implement the org.dspace.app.mediafilter.FormatFilter interface. More information on the methods you need to implement is provided in the FormatFilter.java source file. For example:
...
If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should define this as described in Chapter 13.3.2.2 .
If you have a more complex Media/Format Filter, which actually performs multiple filtering or conversions for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should have define a class which implements the FormatFilter interface, while also extending the Chapter 13.3.2.2 SelfNamedPlugin class. For example:
...