Date: Fri, 29 Mar 2024 10:37:21 -0400 (EDT) Message-ID: <1541632136.169.1711723041585@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_168_1131731978.1711723041585" ------=_Part_168_1131731978.1711723041585 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The org.dspace.core package provides some basic classes that ar= e used throughout the DSpace code.
The configuration service is responsible for reading the main dspace= .cfg properties file, managing the 'template' configuration files for = other applications such as Apache, and for obtaining the text for e-mail me= ssages.
The system is configured by editing the relevant files in [dspace]=
/config
, as described in the configuration section.
When editing configuration files for applications that DSpace us=
es, such as Apache Tomcat, you may want to edit the copy in =
[dspace-source]
and then run ant update
or ant o=
verwrite_configs
rather than editing the 'live' ver=
sion directly! This will ensure you have a backup copy of your mod=
ified configuration files, so that they are not accidentally overwritten in=
the future.
The ConfigurationService class can also be invoked as a command= line tool:
[dspace]/bin/dspace dsprop property.name
This writes the v=
alue of property.name from dspace.cfg to the standard out=
put, so that shell scripts can access the DSpace configuration. If the prop=
erty has no value, nothing is written.For many more details on configuration in DSpace, see Configuration Reference
This class contains constants that are used to represent types of object=
and actions in the database. For example, authorization policies can relat=
e to objects of different types, so the resourcepolicy table has c=
olumns resource_id, which is the internal ID of the object, and
Here are a some of the most commonly used constants you might come acros= s:
DSpace types
DSpace actions
Refer to the org.dspace.core.Constants for all of the Constants.=
The Context class is central to the DSpace operation. Any code = that wishes to use the any API in the business logic layer must first creat= e itself a Context object. This is akin to opening a connection to= a database (which is in fact one of the things that happens.)
A context object is involved in most method calls and object constructor= s, so that the method or object has access to information about the current= operation. When the context object is constructed, the following informati= on is automatically initialized:
You should always abort a context if any error happens= during its lifespan; otherwise the data in the system may be left in an in= consistent state. You can also commit a context, which means that = any changes are written to the database, and the context is kept active for= further use.
Sending e-mails is pretty easy. Just use the configuration manager's getEmail method, set the arguments and recipients, and send.
The e-mail texts are stored in [dspace]/config/emails
. They=
are processed by the standard java.text.MessageFormat. At the top=
of each e-mail are listed the appropriate arguments that should be filled =
out by the sender. Example usage is shown in the org.dspace.core.Email<=
/em> Javadoc API documentation.
The log manager consists of a method that creates a standard log header,= and returns it as a string suitable for logging. Note that this class does= not actually write anything to the logs; the log header returned should be= logged directly by the sender using an appropriate Log4J call, so that inf= ormation about where the logging is taking place is also stored.
The level of logging can be configured on a per-package or per-class bas=
is by editing [dspace]/config/log4j.properties
. You will need =
to stop and restart Tomcat for the changes to take effect.
A typical log entry looks like this:
2002-11-11 08:11:32,903 INFO org.dspace.app.webui.servlet.DSpaceServ= let @ anonymous:session_id=3DBD84E7C194C2CF4BD0EC3A6CAD0142BB:view_item:han= dle=3D1721.1/1686
This is breaks down like this:
Date and time, milliseconds |
2002-11-11 08:11:32,903 |
Level (FATAL, WARN, INF= O or DEBUG) |
INFO |
Java class |
org.dspace.app.webui.servlet.DSpaceServle= t |
|
@ |
User email or anonymous |
anonymous |
|
: |
Extra log info from context |
session_id=3DBD84E7C194C2CF4BD0EC3A6CAD01= 42BB |
|
: |
Action |
view_item |
|
: |
Extra info |
handle=3D1721.1/1686 |
The above format allows the logs to be easily parsed and analyzed. The <= code>[dspace]/bin/log-reporter script is a simple tool for analyzing= logs. Try:
[dspace= ]/bin/log-reporter --help
It's a good idea to 'nice' this log reporter to avoid an impact on serve= r performance.
Utils contains miscellaneous utility method that are required i= n a variety of places throughout the code, and thus have no particular 'hom= e' in a subsystem.
The content management API package org.dspace.content contains = Java classes for reading and manipulating content stored in the DSpace syst= em. This is the API that components in the application layer will probably = use most.
Classes corresponding to the main elements in the DSpace data model (Community, Collection, Item, Bundle and
Each class generally has one or more static find methods, which= are used to instantiate content objects. Constructors do not have public a= ccess and are just used internally. The reasons for this are:
"Constructing" an object may be misconstrued as the action of creati= ng an object in the DSpace system, for example one might expect something l= ike:
Context= dsContent =3D new Context(); Item myItem =3D new Item(context, id)
to construct a brand new item in the system, rather than simply in= stantiating an in-memory instance of an object in the system.
Collection, Bundle and Bitstream do not have = create methods; rather, one has to create an object using the rele= vant method on the container. For example, to create a collection, one must= invoke createCollection on the community that the collection is t= o appear in:
Context= context =3D new Context(); Community existingCommunity =3D Community.find(context, 123); Collection myNewCollection =3D existingCommunity.createCollection();
The primary reason for this is for determining authorization. In order t= o know whether an e-person may create an object, the system must know which= container the object is to be added to. It makes no sense to create a coll= ection outside of a community, and the authorization system does not have a= policy for that.
Items are first created in the form of an implementation of InProgre= ssSubmission. An InProgressSubmission represents an item unde= r construction; once it is complete, it is installed into the main archive = and added to the relevant collection by the InstallItem class. The= org.dspace.content package provides an implementation of InPr= ogressSubmission called WorkspaceItem; this is a simple imple= mentation that contains some fields used by the Web submission UI. The = org.dspace.workflow also contains an implementation called Workflo= wItem which represents a submission undergoing a workflow process.
In the previous chapter there is an overview of the item ingest process = which should clarify the previous paragraph. Also see the section on the wo= rkflow system.
Community and BitstreamFormat do have static creat= e methods; one must be a site administrator to have authorization to i= nvoke these.
Classes whose name begins DC are for manipulating Dublin Core m= etadata, as explained below.
The FormatIdentifier class attempts to guess the bitstream form= at of a particular bitstream. Presently, it does this simply by looking at = any file extension in the bitstream name and matching it up with the file e= xtensions associated with bitstream formats. Hopefully this can be greatly = improved in the future!
The ItemIterator class allows items to be retrieved from storag= e one at a time, and is returned by methods that may return a large number = of items, more than would be desirable to have in memory at once.
The ItemComparator class is an implementation of the standard <= em>java.util.Comparator that can be used to compare and order items ba= sed on a particular Dublin Core metadata field.
When creating, modifying or for whatever reason removing data with the c= ontent management API, it is important to know when changes happen in-memor= y, and when they occur in the physical DSpace storage.
Primarily, one should note that no change made using a particular or= g.dspace.core.Context object will actually be made in the underlying s= torage unless complete or commit is invoked on that C= ontext. If anything should go wrong during an operation, the context s= hould always be aborted by invoking abort, to ensure that no incon= sistent state is written to the storage.
Additionally, some changes made to objects only happen in-memory. In the= se cases, invoking the update method lines up the in-memory change= s to occur in storage when the Context is committed or completed. = In general, methods that change any metadata field only make the change in-= memory; methods that involve relationships with other objects in the system= line up the changes to be committed with the context. See individual metho= ds in the API Javadoc.
Some examples to illustrate this are shown below:
Context= context =3D new Context(); Bitstream b =3D Bitstream.find(context, 1234); b.setName("newfile.txt"); b.update(); context.complete(); |
Will change storage |
Context= context =3D new Context(); Bitstream b =3D Bitstream.find(context, 1234); b.setName("newfile.txt"); b.update(); context.abort(); |
Will not change storage (con= text aborted) |
Context= context =3D new Context(); Bitstream b =3D Bitstream.find(context, 1234); b.setName("newfile.txt"); context.complete(); |
The new name will not be sto= red since update was not invoked |
Context= context =3D new Context(); Bitstream bs =3D Bitstream.find(context, 1234); Bundle bnd =3D Bundle.find(context, 5678); bnd.add(bs); context.complete(); |
The bitstream will be includ= ed in the bundle, since update doesn't need to be called |
Instantiating some content objects also causes other content objects to = be loaded into memory.
Instantiating a Bitstream object causes the appropriate Bit= streamFormat object to be instantiated. Of course the Bitstream object does not load the underlying bits from the bitstream store into m= emory!
Instantiating a Bundle object causes the appropriate Bitstr= eam objects (and hence BitstreamFormats) to be instantiated.<= /p>
Instantiating an Item object causes the appropriate Bundle<= /em> objects (etc.) and hence BitstreamFormats to be instantiated.= All the Dublin Core metadata associated with that item are also loaded int= o memory.
The reasoning behind this is that for the vast majority of cases, anyone= instantiating an item object is going to need information about the bundle= s and bitstreams within it, and this methodology allows that to be done in = the most efficient way and is simple for the caller. For example, in the We= b UI, the servlet (controller) needs to pass information about an item to t= he viewer (JSP), which needs to have all the information in-memory to displ= ay the item without further accesses to the database which may cause errors= mid-display.
You do not need to worry about multiple in-memory instantiations of the = same object, or any inconsistencies that may result; the Context o= bject keeps a cache of the instantiated objects. The find methods = of classes in org.dspace.content will use a cached object if one e= xists.
It may be that in enough cases this automatic instantiation of contained= objects reduces performance in situations where it is important; if this p= roves to be true the API may be changed in the future to include a load= Contents method or somesuch, or perhaps a Boolean parameter indicating= what to do will be added to the find methods.
When a Context object is completed, aborted or garbage-collecte= d, any objects instantiated using that context are invalidated and should n= ot be used (in much the same way an AWT button is invalid if the window con= taining it is destroyed).
The Metadatum class is a simple container that represents a single Dubli= n Core-like element, optional qualifier, value and language. Note that sinc= e DSpace 1.4 the MetadataValue and associated classes are preferre= d (see Support for Other Metadata Schemas). The other classes starting with= DC are utility classes for handling types of data in Dublin Core,= such as people's names and dates. As supplied, the DSpace registry of elem= ents and qualifiers corresponds to the Library Application= Profile for Dublin Core. It should be noted that these utility classes= assume that the values will be in a certain syntax, which will be true for= all data generated within the DSpace system, but since Dublin Core does no= t always define strict syntax, this may not be true for Dublin Core origina= ting outside DSpace.
Below is the specific syntax that DSpace expects various fields to adher= e to:
Element |
Qualifier |
Syntax |
Helper Class |
date |
Any or unqualified |
ISO 8601 in the UTC time zone, with either ye= ar, month, day, or second precision. Examples:_2000 2002-10 2002-08-14 1999= -01-01T14:35:23Z _ |
DCDate |
contributor |
Any or unqualified |
In general last name, then a comma, then firs= t names, then any additional information like "Jr.". If the contributor is = an organization, then simply the name. Examples:_Doe, John Smith, John Jr. = van Dyke, Dick Massachusetts Institute of Technology _ |
DCPersonName |
language |
iso |
A two letter code taken ISO 639, followed opt= ionally by a two letter country code taken from ISO 3166. Examples:_en fr e= n_US _ |
DCLanguage |
relation |
ispartofseries |
The series name, following by a semicolon fol= lowed by the number in that series. Alternatively, just free text._MIT-TR; = 1234 My Report Series; ABC-1234 NS1234 _ |
DCSeriesNumber |
To support additional metadata schemas a new set of metadata classes hav= e been added. These are backwards compatible with the DC classes and should= be used rather than the DC specific classes wherever possible. Note that h= ierarchical metadata schemas are not currently supported, only flat schemas= (such as DC) are able to be defined.
The MetadataField class describes a metadata field by schema, e= lement and optional qualifier. The value of a MetadataField is des= cribed by a MetadataValue which is roughly equivalent to the older= Metadatum class. Finally the MetadataSchema class is used to desc= ribe supported schemas. The DC schema is supported by default. Refer to the= javadoc for method details.
The Packager plugins let you ingest a package to create a new D= Space Object, and disseminate a content Object as a package. A pac= kage is simply a data stream; its contents are defined by the packager plug= in's implementation.
To ingest an object, which is currently only implemented for Items, the = sequence of operations is:
Here is an example package ingestion code fragment:
Collect= ion collection =3D find target collection InputStream source =3D ...; PackageParameters params =3D ...; String license =3D null; PackageIngester sip =3D (PackageIngester) PluginManager .getNamedPlugin(PackageIngester.class, packageType); WorkspaceItem wi =3D sip.ingest(context, collection, source, params, li= cense);
Here is an example of a package dissemination:
OutputS= tream destination =3D ...; PackageParameters params =3D ...; DSpaceObject dso =3D ...; PackageIngester dip =3D (PackageDisseminator) PluginManager .getNamedPlugin(PackageDisseminator.class, packageType); dip.disseminate(context, dso, params, destination);
In DSpace 6, the old "PluginManager" was replaced by org.dspace.co=
re.service.PluginService
which performs the same activities/actions.=
The PluginService is a very simple component container. It creates and o= rganizes components (plugins), and helps select a plugin in the cases where= there are many possible choices. It also gives some limited control over t= he life cycle of a plugin.
The following terms are important in understanding the rest of this sect= ion:
The Plugin Service supports three different patterns of usage:
getSinglePlugin()
met=
hod.getPluginSe=
quence()
method.getNamedPlugin()
method and the getAllP=
luginNames()
methods.Named plugins can get their names either from the configuration or, for = a variant called self-named plugins, from within the plugin itself.
Self-named plugins are necessary because one plugin implementation can b= e configured itself to take on many "personalities", each of which deserves= its own plugin name. It is already managing its own configuration for each= of these personalities, so it makes sense to allow it to export them to th= e Plugin Manager rather than expecting the plugin configuration to be kept = in sync with it own configuration.
An example helps clarify the point: There is a named plugin that does cr= osswalks, call it CrosswalkPlugin. It has several implementations = that crosswalk some kind of metadata. Now we add a new plugin which uses XS= L stylesheet transformation (XSLT) to crosswalk many types of metadata =E2= =80=93 so the single plugin can act like many different plugins, depending = on which stylesheet it employs.
This XSLT-crosswalk plugin has its own configuration that maps a Plugin = Name to a stylesheet =E2=80=93 it has to, since of course the Plugin Manage= r doesn't know anything about stylesheets. It becomes a self-named plugin, = so that it reads its configuration data, gets the list of names to which it= can respond, and passes those on to the Plugin Manager.
When the Plugin Service creates an instance of the XSLT-crosswalk, it re= cords the Plugin Name that was responsible for that instance. The plugin ca= n look at that Name later in order to configure itself correctly for the Na= me that created it. This mechanism is all part of the SelfNamedPlugin class= which is part of any self-named plugin.
The most common thing you will do with the Plugin Service is obtain an i= nstance of a plugin. To request a plugin, you must always specify the plugi= n interface you want. You will also supply a name when asking for a named p= lugin.
A sequence plugin is returned as an array of _Object_s since it is actua= lly an ordered list of plugins.
See the getSinglePlugin(), getPluginSequence(), getNamedPlugin() methods= .
When PluginService fulfills a request for a plugin, a new insta= nce is always created.
The PluginService can list all the names of the Named Plugins w= hich implement an interface. You may need this, for example, to implement a= menu in a user interface that presents a choice among all possible plugins= . See the getAllPluginNames() method.
Note that it only returns the plugin name, so if you need a more sophist= icated or meaningful "label" (i.e. a key into the I18N message catalog) the= n you should add a method to the plugin itself to return that.
Note: The PluginService refers to interfaces and classes intern= ally only by their names whenever possible, to avoid loading classes until = absolutely necessary (i.e. to create an instance). As you'll see below, sel= f-named classes still have to be loaded to query them for names, but for th= e most part it can avoid loading classes. This saves a lot of time at start= -up and keeps the JVM memory footprint down, too. As the Plugin Manager get= s used for more classes, this will become a greater concern.
The only downside of "on-demand" loading is that errors in the configura= tion don't get discovered right away. The solution is to call the check= Configuration() method after making any changes to the configuration.<= /p>
The LegacyPluginServiceImpl class is the default PluginService = implementation. While it is possible to implement your own version of Plugi= nService, no other implementations are provided with DSpace
Here are the public methods, followed by explanations:
Object getSinglePlugin(Class interfaceClass)
- Returns =
an instance of the singleton (single) plugin implementing the given interfa=
ce. There must be exactly one single plugin configured for this interface, =
otherwise the PluginConfigurationError is thrown. Note that this i=
s the only "get plugin" method which throws an exception. It is typically u=
sed at initialization time to set up a permanent part of the system so any =
failure is fatal. See the plugin.single configuration key for conf=
iguration details.
Object[] getPluginSequence(Class interfaceClass)
- Retu=
rns instances of all plugins that implement the interface interfaceClas=
s, in an Array. Returns an empty array if no there are no mat=
ching plugins. The order of the plugins in the array is the same as their c=
lass names in the configuration's value field. See the plugin.sequence<=
/em> configuration key for configuration details.
Object getNamedPlugin(Class interfaceClass, String name)
-=
Returns an instance of a plugin that implements the interface interfac=
eClass and is bound to a name matching name. If there is no matching p=
lugin, it returns null. The names are matched by String.equals(). =
See the plugin.named and plugin.selfnamed configuration k=
eys for configuration details.String[] getAllPluginNames(Class
interfaceClass
)<=
/code> - Returns all of the names under which a named plugin implementing t=
he interface interfaceClass can be requested (with getNamedPlu=
gin()). The array is empty if there are no matches. Use this to popula=
te a menu of plugins for interactive selection, or to document what the pos=
sible choices are. The names are NOT returned in any predictable order, so =
you may wish to sort them first. Note: Since a plugin may be bound to more =
than one name, the list of names this returns does not represent the list o=
f plugins. To get the list of unique implementation classes corresponding t=
o the names, you might have to eliminate duplicates (i.e. create a Set of c=
lasses).
A named plugin implementation must extend this class if it wants to supp= ly its own Plugin Name(s). See Self-Named Plugins for why this is sometimes= necessary.
abstrac= t class SelfNamedPlugin { // Your class must override this: // Return all names by which this plugin should be known. public static String[] getPluginNames(); // Returns the name under which this instance was created. // This is implemented by SelfNamedPlugin and should NOT be =09overridden. public String getPluginInstanceName(); }
public = class PluginConfigurationError extends Error { public PluginConfigurationError(String message); }
An error of this type means the caller asked for a single plugin, but ei= ther there was no single plugin configured matching that interface, or ther= e was more than one. Either case causes a fatal configuration error.
public = class PluginInstantiationException extends RuntimeException { public PluginInstantiationException(String msg, Throwable cause) }
This exception indicates a fatal error when instantiating a plugin class= . It should only be thrown when something unexpected happens in the course = of instantiating a plugin, e.g. an access error, class not found, etc. Simp= ly not finding a class in the configuration is not an exception.
This is a RuntimeException so it doesn't have to be declared, a= nd can be passed all the way up to a generalized fatal exception handler.= p>
All of the Plugin Service's configuration comes from the DSpace Configur= ation Service (see Con= figuration Reference). You can configure these characteristics of each = plugin:
This entry configures a Single Plugin for use with getSinglePlugin():
plugin.= single.interface =3D classname
For example, this configures the class org.dspace.checker.SimpleDisp= atcher as the plugin for interface org.dspace.checker.BitstreamDis= patcher:
plugin.= single.org.dspace.checker.BitstreamDispatcher=3Dorg.dspace.checker.SimpleDi= spatcher
This kind of configuration entry defines a Sequence Plugin, which is bou=
nd to a sequence of implementation classes. The key identifies the interfac=
e, and the value is a comma-separated list of classnames:
plugin.sequence.interface =3D classname, ...
The plugins are returned by getPluginSequence() in the same order=
as their classes are listed in the configuration value.
For example, this entry configures Stackable Authentication with three i= mplementation classes:
plugin.= sequence.org.dspace.eperson.AuthenticationMethod =3D \ org.dspace.eperson.X509Authentication, \ org.dspace.eperson.PasswordAuthentication, \ edu.mit.dspace.MITSpecialGroup
There are two ways of configuring named plugins:
Plugins Named in the Configuration A named plugin w= hich gets its name(s) from the configuration is listed in this kind of entr= y:_plugin.named.interface =3D classname =3D name [ , name.. ] [ classname = =3D name.. ]_The syntax of the configuration value is: classname, followed = by an equal-sign and then at least one plugin name. Bind more names to the = same implementation class by adding them here, separated by commas. Names m= ay include any character other than comma (,) and equal-sign (=3D).For exam= ple, this entry creates one plugin with the names GIF, JPEG, and image/png,= and another with the name TeX:
plugin.= named.org.dspace.app.mediafilter.MediaFilter =3D \ org.dspace.app.mediafilter.JPEGFilter =3D GIF, JPEG, image/png \ org.dspace.app.mediafilter.TeXFilter =3D TeX
This example shows a plugin name with an embedded whitespace chara= cter. Since comma (,) is the separator character between plugin names, spac= es are legal (between words of a name; leading and trailing spaces are igno= red).This plugin is bound to the names "Adobe PDF", "PDF", and "Portable Do= cument Format".
plugin.= named.org.dspace.app.mediafilter.MediaFilter =3D \ org.dspace.app.mediafilter.TeXFilter =3D TeX \ org.dspace.app.mediafilter.PDFFilter =3D Adobe PDF, PDF, Portable Do= cument Format
NOTE: Since there can only be one key with plugin.named. followed = by the interface name in the configuration, all of the plugin implementatio= ns must be configured in that entry.
Self-Named Plugins Since a self-named plugin suppli= es its own names through a static method call, the configuration only has t= o include its interface and classname:plugin.selfnamed.interface =3D cl= assname [ , classname.. ] The following example first demonstrates how the = plugin class, XsltDisseminationCrosswalk is configured to implement it= s own names "MODS" and "DublinCore". These come from the keys starting with= crosswalk.dissemination.stylesheet.. The value is a stylesheet fi= le. The class is then configured as a self-named plugin:
crosswa= lk.dissemination.stylesheet.DublinCore =3D xwalk/TESTDIM-2-DC_copy.xsl crosswalk.dissemination.stylesheet.MODS =3D xwalk/mods.xsl plugin.selfnamed.crosswalk.org.dspace.content.metadata.DisseminationCrosswa= lk =3D \ org.dspace.content.metadata.MODSDisseminationCrosswalk, \ org.dspace.content.metadata.XsltDisseminationCrosswalk
NOTE: Since there can only be one key with plugin.selfnamed.= em> followed by the interface name in the configuration, all of the plugin = implementations must be configured in that entry. The MODSDissemination= Crosswalk class is only shown to illustrate this point.
Here are some usage examples to illustrate how the Plugin Service works.=
The MediaFilterService implementation relies heavily on the Plugin Servi= ce. The MediaFilter classes become plugins named in the configuration. Refe= r to the Configur= ation Reference for further details.
This shows how to configure and access a single anonymous plugin, such a= s the BitstreamDispatcher plugin:
Configuration:
plugin.single.org.dspace.checker.BitstreamDispatcher=3Dorg.dspace.ch= ecker.SimpleDispatcher
The following code fragment shows how dispatcher, the service object, is= initialized and used:
Bitstre= amDispatcher dispatcher =3D (BitstreamDispatcher)PluginManager.getSinglePlu= gin(BitstreamDispatcher.class); int id =3D dispatcher.next(); while (id !=3D BitstreamDispatcher.SENTINEL) { /* do some processing here */ id =3D dispatcher.next(); }
This crosswalk plugin acts like many different plugins since it is confi= gured with different XSL translation stylesheets. Since it already gets eac= h of its stylesheets out of the DSpace configuration, it makes sense to hav= e the plugin give PluginService the names to which it answers instead of fo= rcing someone to configure those names in two places (and try to keep them = synchronized).
Here is the configuration file listing both the plugin's own configurati= on and the PluginService config line:
crosswa= lk.dissemination.stylesheet.DublinCore =3D xwalk/TESTDIM-2-DC_copy.xsl crosswalk.dissemination.stylesheet.MODS =3D xwalk/mods.xsl plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk =3D \ org.dspace.content.metadata.XsltDisseminationCrosswalk
This look into the implementation shows how it finds configuration entri= es to populate the array of plugin names returned by the getPluginNames= () method. Also note, in the getStylesheet() method, how it u= ses the plugin name that created the current instance (returned by getP= luginInstanceName()) to find the correct stylesheet.
public = class XsltDisseminationCrosswalk extends SelfNamedPlugin { .... private final String prefix =3D =09"crosswalk.dissemination.stylesheet."; .... public static String[] getPluginNames() { List aliasList =3D new ArrayList(); Enumeration pe =3D ConfigurationManager.propertyNames(); while (pe.hasMoreElements()) { String key =3D (String)pe.nextElement(); if (key.startsWith(prefix)) aliasList.add(key.substring(prefix.length())); } return (String[])aliasList.toArray(new =09String[aliasList.size()]); } // get the crosswalk stylesheet for an instance of the plugin: private String getStylesheet() { return ConfigurationManager.getProperty(prefix + =09getPluginInstanceName()); } }
The Stackable Authentication mechanism needs to know all of the plugins = configured for the interface, in the order of configuration, since order is= significant. It gets a Sequence Plugin from the Plugin Manager. Refer to t= he Configuration Section on Stackable Authentication for further details.= p>
The primary classes are:
org.dspace.content.WorkspaceItem = |
contains an Item before it enters a workflow<= /p> |
org.dspace.workflow.WorkflowItem = |
contains an Item while in a workflow |
org.dspace.workflow.WorkflowService<= /p> |
responds to events, manages the WorkflowItem = states. There are two implementations, the traditional, default workflow (d= escribed below) and Conf= igurable Workflow. |
org.dspace.content.Collection | contains List of defined workflow steps = td> |
org.dspace.eperson.Group |
people who can perform workflow tasks are def= ined in EPerson Groups |
org.dspace.core.Email |
used to email messages to Group members and s= ubmitters |
The default workflow system models the states of an Item in a state mach= ine with 5 states (SUBMIT, STEP_1, STEP_2, STEP_3, ARCHIVE.) These are the = three optional steps where the item can be viewed and corrected by differen= t groups of people. Actually, it's more like 8 states, with STEP_1_POOL, ST= EP_2_POOL, and STEP_3_POOL. These pooled states are when items are waiting = to enter the primary states. Optionally, you can also choose to enabl= e the enhanced, Configur= able Workflow, if you wish to have more control over your workflow step= s/states. (Note: the remainder of this description relates to the tradi= tional, default workflow. For more information on the Configurable Workflow= option, visit = Configurable Workflow.)
The WorkflowService is invoked by events. While an Item is being submitt= ed, it is held by a WorkspaceItem. Calling the start() method in the Workfl= owService converts a WorkspaceItem to a WorkflowItem, and begins processing= the WorkflowItem's state. Since all three steps of the workflow are option= al, if no steps are defined, then the Item is simply archived.
Workflows are set per Collection, and steps are defined by creating corr= esponding entries in the List named workflowGroup. If you wish the workflow= to have a step 1, use the administration tools for Collections to create a= workflow Group with members who you want to be able to view and approve th= e Item, and the workflowGroup[0] becomes set with the ID of that Group.
If a step is defined in a Collection's workflow, then the WorkflowItem's= state is set to that step_POOL. This pooled state is the WorkflowItem wait= ing for an EPerson in that group to claim the step's task for that Workflow= Item. The WorkflowManager emails the members of that Group notifying them t= hat there is a task to be performed (the text is defined in config/emails,)= and when an EPerson goes to their 'My DSpace' page to claim the task, the = WorkflowManager is invoked with a claim event, and the WorkflowItem's state= advances from STEP_x_POOL to STEP_x (where x is the corresponding step.) T= he EPerson can also generate an 'unclaim' event, returning the WorkflowItem= to the STEP_x_POOL.
Other events the WorkflowService handles are advance(), which advances t= he WorkflowItem to the next state. If there are no further states, then the= WorkflowItem is removed, and the Item is then archived. An EPerson perform= ing one of the tasks can reject the Item, which stops the workflow, rebuild= s the WorkspaceItem for it and sends a rejection note to the submitter. Mor= e drastically, an abort() event is generated by the admin tools to cancel a= workflow outright.
The org.dspace.administer package contains some classes for adm= inistering a DSpace system that are not generally needed by most applicatio= ns.
The CreateAdministrator class is a simple command-line tool, ex=
ecuted via [dspace]/bin/dspace create-administrator
, that crea=
tes an administrator e-person with information entered from standard input.=
This is generally used only once when a DSpace system is initially install=
ed, to create an initial administrator who can then use the Web administrat=
ion UI to further set up the system. This script does not check for authori=
zation, since it is typically run before there are any e-people to authoriz=
e! Since it must be run as a command-line tool on the server machine, gener=
ally this shouldn't cause a problem. A possibility is to have the script on=
ly operate when there are no e-people in the system already, though in gene=
ral, someone with access to command-line scripts on your server is probably=
in a position to do what they want anyway!
The DCType class is similar to the org.dspace.content.Bitst= reamFormat class. It represents an entry in the Dublin Core type regis= try, that is, a particular element and qualifier, or unqualified element. I= t is in the administer package because it is only generally requir= ed when manipulating the registry itself. Elements and qualifiers are speci= fied as literals in org.dspace.content.Item methods and the or= g.dspace.content.Metadatum class. Only administrators may modify the D= ublin Core type registry.
The org.dspace.administer.RegistryLoader class contains methods= for initializing the Dublin Core type registry and bitstream format regist= ry with entries in an XML file. Typically this is executed via the command = line during the build process (see build.xml in the source.) To se= e examples of the XML formats, see the files in config/registries = in the source directory. There is no XML schema, they aren't validated stri= ctly when loaded in.
DSpace keeps track of registered users with the org.dspace.eperson.E= Person class. The class has methods to create and manipulate an EP= erson such as get and set methods for first and last names, email, and= password. (Actually, there is no getPassword() method=E2=80=9A an= MD5 hash of the password is stored, and can only be verified with the = checkPassword() method.) There are find methods to find an EPerson by = email (which is assumed to be unique,) or to find all EPeople in the system= .
The EPerson object should probably be reworked to allow for eas= y expansion; the current EPerson object tracks pretty much only what MIT wa= s interested in tracking - first and last names, email, phone. The access m= ethods are hardcoded and should probably be replaced with methods to access= arbitrary name/value pairs for institutions that wish to customize what EP= erson information is stored.
Groups are simply lists of EPerson objects. Other than membersh= ip, Group objects have only one other attribute: a name. Group nam= es must be unique, so (for groups associated with workflows) we have adopte= d naming conventions where the role of the group is its name, such as C= OLLECTION_100_ADD. Groups add and remove EPerson objects with addM= ember() and removeMember() methods. One important thing to kn= ow about groups is that they store their membership in memory until the update() method is called - so when modifying a group's membership do= n't forget to invoke update() or your changes will be lost! Since = group membership is used heavily by the authorization system a fast isM= ember() method is also provided.
Two specific groups are created when DSpace is installed: Administ= rator (which can bypass authorization) and Anonymous (which is assigned to = all sessions that are not logged in). The code expects these groups t= o exist. They cannot be renamed or deleted.
Another kind of Group is also implemented in DSpace=E2=80=9A special Gro= ups. The Context object for each session carries around a List of = Group IDs that the user is also a member of=E2=80=9A currently the MITUser = Group ID is added to the list of a user's special groups if certain IP addr= ess or certificate criteria are met.
The primary classes are:
org.dspace.authorize.AuthorizeService |
does all authorization, checking policies aga= inst Groups |
org.dspace.authorize.ResourcePolicy<= /p> |
defines all allowable actions for an object= p> |
org.dspace.eperson.Group |
all policies are defined in terms of EPerson = Groups |
The authorization system is based on the classic 'police state' model of= security; no action is allowed unless it is expressed in a policy. The pol= icies are attached to resources (hence the name ResourcePolicy,) a= nd detail who can perform that action. The resource can be any of the DSpac= e object types, listed in org.dspace.core.Constants (BITSTREAM= , ITEM, COLLECTION, etc.) The 'who' is made up of EP= erson groups. The actions are also in Constants.java (READ, WRITE, ADD, etc.) The only non-obvious actions are ADD and REMOVE, which are authorizations for container objec= ts. To be able to create an Item, you must have ADD permission in = a Collection, which contains Items. (Communities, Collections, Items, and B= undles are all container objects.)
Currently most of the read policy checking is done with items=E2=80=9A c= ommunities and collections are assumed to be openly readable, but items and= their bitstreams are checked. Separate policy checks for items and their b= itstreams enables policies that allow publicly readable items, but parts of= their content may be restricted to certain groups.
Three new attributes have been introduced in the ResourcePolicy= class as part of the DSpace Embargo= a> Contribution:
While rpname and rpdescription _are fields manageable = by the users the _rptype is a fields managed by the system. It represe= nts a type that a resource policy can assume beteween the following:
An custom policy, created for the purpose of creating an embargo could l= ook like:
policy_= id: 4847 resource_type_id: 2 resource_id: 89 action_id: 0 eperson_id: epersongroup_id: 0 start_date: 2013-01-01 end_date: rpname: Embargo Policy rpdescription: Embargoed through 2012 rptype: TYPE_CUSTOM
The AuthorizeService class'
authorizeAction(Context, obj=
ect, action) is the primary source of all authorization in the system.=
It gets a list of all of the ResourcePolicies in the system that match the=
object and action. It then iterates through the policies, extracting the E=
Person Group from each policy, and checks to see if the EPersonID from the =
Context is a member of any of those groups. If all of the policies are quer=
ied and no permission is found, then an AuthorizeException is thro=
wn. An authorizeAction() method is also supplied that returns a bo=
olean for applications that require higher performance.
ResourcePolicies are very simple, and there are quite a lot of them. Eac= h can only list a single group, a single action, and a single object. So ea= ch object will likely have several policies, and if multiple groups share p= ermissions for actions on an object, each group will get its own policy. (I= t's a good thing they're small.)
All users are assumed to be part of the public group (ID=3D0.) DSpace ad= mins (ID=3D1) are automatically part of all groups, much like super-users i= n the Unix OS. The Context object also carries around a List of special gro= ups, which are also first checked for membership. These special groups are = used at MIT to indicate membership in the MIT community, something that is = very difficult to enumerate in the database! When a user logs in with an MI= T certificate or with an MIT IP address, the login code adds this MIT user = group to the user's Context.
Where do items get their read policies? From the their collection's read= policy. There once was a separate item read default policy in each collect= ion, and perhaps there will be again since it appears that administrators a= re notoriously bad at defining collection's read policies. There is also co= de in place to enable policies that are timed=E2=80=9A have a start and end= date. However, the admin tools to enable these sorts of policies have not = been written.
The org.dspace.handle package contains two classes; HandleS= ervice is used to create and look up Handles, and HandlePlugin is used to expose and resolve DSpace Handles for the outside world via th= e CNRI Handle Server code.
Handles are stored internally in the handle database table in t= he form:
1721.123/4567
Typically when they are used outside of the system they are displayed in= either URI or "URL proxy" forms:
hdl:172= 1.123/4567 http://hdl.handle.net/1721.123/4567
It is the responsibility of the caller to extract the basic form from wh= ichever displayed form is used.
The handle table maps these Handles to resource type/resource I= D pairs, where resource type is a value from org.dspace.core.Constants<= /em> and resource ID is the internal identifier (database primary key) of t= he object. This allows Handles to be assigned to any type of object in the = system, though as explained in the functional overview, only communities, c= ollections and items are presently assigned Handles.
HandleService contains static methods for:
Note that since the Handle server runs as a separate JVM to the DSpace W=
eb applications, it uses a separate 'Log4J' configuration, since Log4J does=
not support multiple JVMs using the same daily rolling logs. This alternat=
ive configuration is located at [dspace]/config/log4j-handle-plugin.p=
roperties
. The [dspace]/bin/start-handle-server
script =
passes in the appropriate command line parameters so that the Handle server=
uses this configuration.
In additional to Handles, DSpace also provides basic support for DOIs (D= igital Object Identifiers). For more information visit DOI Digital Object Identifier.= p>
DSpace's search code is a simple, configurable API which currently wraps= Apache Solr. See Discovery for = more information on how to customize the default search settings, etc.
The org.dspace.search package also provides a 'harvesting' API.= This allows callers to extract information about items modified within a p= articular timeframe, and within a particular scope (all of DSpace, or a com= munity or collection.) Currently this is used by the Open Archives Initiati= ve metadata harvesting protocol application, and the e-mail subscription co= de.
The Harvest.harvest is invoked with the required scope and star= t and end dates. Either date can be omitted. The dates should be in the ISO= 8601, UTC time zone format used elsewhere in the DSpace system.
HarvestedItemInfo objects are returned. These objects are simpl= e containers with basic information about the items falling within the give= n scope and date range. Depending on parameters passed to the harvest= em> method, the containers and item fields may have been = filled out with the IDs of communities and collections containing an item, = and the corresponding Item object respectively. Electing not to ha= ve these fields filled out means the harvest operation executes considerabl= e faster.
In case it is required, Harvest also offers a method for creati= ng a single HarvestedItemInfo object, which might make things easi= er for the caller.
The browse API uses the same underlying technology as the Search API (Ap= ache Solr, see also Discovery). = It maintains indexes of dates, authors, titles and subjects, and allows cal= lers to extract parts of these:
Ideally, a name that appears as an author for more than one item wou= ld appear in the author index only once. For example, 'Doe, John' may be th= e author of tens of items. However, in practice, author's names often appea= r in slightly differently forms, for example:
Doe, Jo= hn Doe, John Stewart Doe, John S.
Currently, the above three names would all appear as separate entr= ies in the author index even though they may refer to the same author. In o= rder for an author of several papers to be correctly appear once in the ind= ex, each item must specify exactly the same form of their name, wh= ich doesn't always happen in practice.
Date of Issue: Items are indexed by date of issue. = This may be different from the date that an item appeared in DSpace; many i= tems may have been originally published elsewhere beforehand. The Dublin Co= re field used is date.issued. The ordering of this index m= ay be reversed so 'earliest first' and 'most recent first' orderings are po= ssible. Note that the index is of items by date, as opposed to an = index of dates. If 30 items have the same issue date (say 2002), t= hen those 30 items all appear in the index adjacent to each other, as oppos= ed to a single 2002 entry. Since dates in DSpace Dublin Core are in ISO8601= , all in the UTC time zone, a simple alphanumeric sort is sufficient to sor= t by date, including dealing with varying granularities of date reasonably.= For example:
2001-12= -10 2002 2002-04 2002-04-05 2002-04-09T15:34:12Z 2002-04-09T19:21:12Z 2002-04-10
The API is generally invoked by creating a BrowseScope object, = and setting the parameters for which particular part of an index you want t= o extract. This is then passed to the relevant Browse method call,= which returns a BrowseInfo object which contains the results of t= he operation. The parameters set in the BrowseScope object are:
To illustrate, here is an example:
The results of invoking Browse.getItemsByTitle with the above p= arameters might look like this:
Rabble-= Rousing Rabbis From Sardinia Reality TV: Love It or Hate It? FOCUS> The Really Exciting Research Video Recreational Housework Addicts: Please Visit My House Regional Television Variation Studies Revenue Streams Ridiculous Example Titles: I'm Out of Ideas
Note that in the case of title and date browses, Item objects a= re returned as opposed to actual titles. In these cases, you can specify th= e 'focus' to be a specific item, or a partial or full literal value. In the= case of a literal value, if no entry in the index matches exactly, the clo= sest match is used as the focus. It's quite reasonable to specify a focus o= f a single letter, for example.
Being able to specify a specific item to start at is particularly import= ant with dates, since many items may have the save issue date. Say 30 items= in a collection have the issue date 2002. To be able to page through the i= ndex 20 items at a time, you need to be able to specify exactly which item'= s 2002 is the focus of the browse, otherwise each time you invoked the brow= se code, the results would start at the first item with the issue date 2002= .
Author browses return String objects with the actual author nam= es. You can only specify the focus as a full or partial literal String<= /em>.
Another important point to note is that presently, the browse indexes co= ntain metadata for all items in the main archive, regardless of authorizati= on policies. This means that all items in the archive will appear to all us= ers when browsing. Of course, should the user attempt to access a non-publi= c item, the usual authorization mechanism will apply. Whether this approach= is ideal is under review; implementing the browse API such that the result= s retrieved reflect a user's level of authorization may be possible, but ra= ther tricky.
Checksum checker is used to verify every item within DSpace. While DSpac= e calculates and records the checksum of every file submitted to it, the ch= ecker can determine whether the file has been changed. The idea being that = the earlier you can identify a file has changed, the more likely you would = be able to record it (assuming it was not a wanted change).
org.dspace.checker.CheckerCommand
class, is the class for t=
he checksum checker tool, which calculates checksums for each bitstream who=
se ID is in the most_recent_checksum table, and compares it agains=
t the last calculated checksum for that bitstream.
DSpace is able to support OpenSearch. For those not acquainted with the = standard, a very brief introduction, with emphasis on what possibilities it= holds for current use and future development.
OpenSearch is a small set of conventions and documents for describing an= d using 'search engines', meaning any service that returns a set of results= for a query. It is nearly ubiquitous=E2=80=9A but also nearly invisible=E2= =80=9A in modern web sites with search capability. If you look at the page = source of Wikipedia, Facebook, CNN, etc you will find buried a link element= declaring OpenSearch support. It is very much a lowest-common-denominator = abstraction (think Google box), but does provide a means to extend its expr= essive power. This first implementation for DSpace supports none o= f these extensions=E2=80=9A many of which are of potential value=E2=80=9A s= o it should be regarded as a foundation, not a finished solution. So the sh= ort answer is that DSpace appears as a 'search-engine' to OpenSearch-aware = software.
Another way to look at OpenSearch is as a RESTful web service for search= , very much like SRW/U, but considerably simpler. This comparative loss of = power is offset by the fact that it is widely supported by web tools and pl= ayers: browsers understand it, as do large metasearch tools.
How Can It Be Used
Flexible, interesting RSS FeedsBecause one of the formats that OpenS= earch specifies for its results is RSS (or Atom), you can turn any search q= uery into an RSS feed. So if there are keywords highly discriminative of co= ntent in a collection or repository, these can be turned into a URL that a = feed reader can subscribe to. Taken to the extreme, one could take any sear= ch a user makes, and dynamically compose an RSS feed URL for it in the page= of returned results. To see an example, if you have a DSpace with OpenSear= ch enabled, try:
http://= dspace.mysite.edu/open-search/?query=3D<your query>
The default format returned is Atom 1.0, so you should see an Atom= document containing your search results.
You can extend the syntax with a few other parameters, as follows:= p>
Parameter |
Values |
---|---|
format |
atom, rss, html |
scope |
handle of a collection or community to restri= ct the search to |
rpp |
number indicating the number of results per p= age (i.e. per request) |
start |
number of page to start with (if paginating r= esults) |
sort_by |
number indicating sorting criteria (same as D= Space advanced search values |
Multiple parameters may be specified on the query string, using th= e "&" character as the delimiter, e.g.:
http://= dspace.mysite.edu/open-search/?query=3D<your query>&format=3Drss&= amp;scope=3D123456789/1
Configuration is through the dspace.cfg
file. See OpenSearch Support for more details.
An embargo is a temporary access restriction placed on content, commenci= ng at time of accession. It's scope or duration may vary, but the fact that= it eventually expires is what distinguishes it from other content restrict= ions. For example, it is not unusual for content destined for DSpace to com= e with permanent restrictions on use or access based on license-driven or o= ther IP-based requirements that limit access to institutionally affiliated = users. Restrictions such as these are imposed and managed using standard ad= ministrative tools in DSpace, typically by attaching specific policies to I= tems or Collections, Bitstreams, etc. The embargo functionally introduced i= n 1.6, however, includes tools to automate the imposition and removal of re= strictions in managed timeframes.
Functionally, the embargo system allows you to attach 'terms' to an item=
before it is placed into the repository, which express how the embargo sho=
uld be applied. What do 'we mean by terms' here? They are really any expres=
sion that the system is capable of turning into (1) the time the embargo ex=
pires, and (2) a concrete set of access restrictions. Some examples:
"2020-09-12" - an absolute date (i.e. the date embargo will be lifted)"6 m=
onths" - a time relative to when the item is accessioned"forever" - an inde=
finite, or open-ended embargo"local only until 2015" - both a time and an e=
xception (public has no access until 2015, local users OK immediately)"Natu=
re Publishing Group standard" - look-up to a policy somewhere (typically 6 =
months)
These terms are 'interpreted' by the embargo system to yield a specific da=
te on which the embargo can be removed or 'lifted', and a specific set of a=
ccess policies. Obviously, some terms are easier to interpret than others (=
the absolute date really requires none at all), and the 'default' embargo l=
ogic understands only the most basic terms (the first and third examples ab=
ove). But as we will see below, the embargo system provides you with the ab=
ility to add in your own 'interpreters' to cope with any terms expressions =
you wish to have. This date that is the result of the interpretation is sto=
red with the item and the embargo system detects when that date has passed,=
and removes the embargo ("lifts it"), so the item bitstreams become availa=
ble. Here is a more detailed life-cycle for an embargoed item:
More Embargo Details
More details on Embargo configuration, including specific examples can b= e found in the Embargo section of = the documentation.