The Islandora Newspaper Batch module uses the Islandora Batch framework to provide a command-line (drush) and a GUI (Drupal interface) option for adding a batch file of newspaper issues and pages to an existing Islandora Newspaper object.
Newspaper Batch uses the value in the MODS dateIssued field on each issue to populate the issue browsing display for newspaper. The data in this field must be formatted as YYYY-MM-DD. If only YYYY is entered, the interface will use the current month and day for the issue.
The Newspaper Batch module is designed for digitized newspapers where each page is represented by an individual TIFF image file. These TIFF files, along with derivatives, full text, and metadata, must be arranged in directories according to a very specific structure.
batch.zip
└── issue1
├── 001
│ └── OBJ.tif
├── 002
│ └── OBJ.tif
└── MODS.xml
- this becomes the MODS record for the issue-level object
batch02.zip
└── issue1
├── 001
│ ├── JP2.jp2
│ ├── JPG.jpg
│ ├── OBJ.tif
│ ├── OCR.txt
│ └── TN.jpg
├── 2
│ ├── JP2.jp2
│ ├── JPG.jpg
│ ├── OBJ.tif
│ ├── OCR.txt
│ └── TN.jpg
└── MODS.xml
If MODS metadata is not available for issue or page objects, the following formats can be supplied and will be automatically transformed to general MODS and DC.
Other things to note about metadata:
Here is a sample MODS file describing a newspaper issue.
<?xml version="1.0" encoding="UTF-8"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<titleInfo>
<title>Canadian Jewish Review, June 1, 1928</title>
</titleInfo>
<originInfo>
<place>
<placeTerm>Toronto, Ontario</placeTerm>
</place>
<publisher>Canadian Jewish Review </publisher>
<dateIssued encoding="iso8601">1928-06-01</dateIssued>
</originInfo>
<language>
<languageTerm>eng</languageTerm>
</language>
<subject>
<topic>Jews, Canadian -- Ontario -- Toronto -- History -- Newspapers</topic>
<topic>Jews, Canadian -- Quebec -- Montreal -- History -- Newspapers</topic>
<topic>Jews -- History -- 20th century -- Newspapers</topic>
<topic>Jews -- Canada -- Periodicals</topic>
<topic>Canada -- History -- 20th century -- Newspapers</topic>
<topic>Ontario -- History -- 20th century -- Newspapers</topic>
<topic>Quebec -- History -- 20th century -- Newspapers</topic>
<topic>Toronto (Ont.) -- History -- 20th century -- Newspapers</topic>
<topic>Montreal (Que.) -- History -- 20th century -- Newspapers</topic>
</subject>
<identifier>Cjewish-1928-06-01</identifier>
</mods>
To use Newspaper Batch in Islandora:
If you have many ZIP files to ingest, or if the ZIP files are too large to ingest through the interface, you can also batch ingest newspapers from the Drupal command line with Drush.
First, your file(s) need to be accessible by the Drush instance. That usually means that they need to be uploaded to the Islandora server (scp, ftp, using a mounted storage drive, etc). The --scan_target option (--target option in Drush 6 and above) is either a directory of issue directories, or a zip file of issue directories. That is, the directories representing the issues to ingest (or issue, if only one) must be one level below the directory or zip file used as the --scan_target.
Second, preprocess the file(s). For a full list of the command-line parameters, see "drush help islandora_newspaper_batch_preprocess". The batch options are also described in the Islandora Batch module.
drush -v -u 1 --uri=http://localhost islandora_newspaper_batch_preprocess --type=directory --scan_target=/path/to/issues --namespace=dailyplanet --parent=islandora:dailyplanet
This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper title object, not an issue object or a collection object.
Here are the options in the drush command:
```
drush help islandora_batch_scan_preprocess
Examples:
drush -v --user=admin --uri=http://digital.library.yorku.ca islandora_batch_scan_preprocess --content_models=islandora:sp_large_image_cmodel --parent=yul:F0433 --parent_relationship_pred=isMemberOfCollection --type=directory --target=/tmp/batch_ingest
Options:
--content_models
Supports one or multiple comma-separated content models which are all applied to each ingested object.
--namespace
Namespace of objects to create. Defaults to namespace specified in Fedora configuration.
--parent
The collection to which the generated items should be added. Defaults to the root Islandora repository PID.
--parent_relationship_pred
The predicate of the relationship to the parent. Defaults to "isMemberOfCollection".
--parent_relationship_uri
The namespace URI of the relationship to the parent. Defaults to "info:fedora/fedora-system:def/relations-ex ternal#".
--target
The target to directory or zip file to scan. Requires the full path to your archive from root directory. e.g. /var/www/drupal/sites/archive.zip Required.
--type
Either "directory" or "zip". The zip importer is unstable with large files (~2GB). Consider alternative methods such as unzipping your Zip file and using Islandora Batch's`--directory` option. Required.
--zip_encoding
The encoding of filenames contained in ZIP archives:Only relevant with --scan_target=zip. Defaults to the native encoding being used by PHP.
Aliases: ibsp
```
Third, process all items in the batch queue:
drush -v -u 1 --uri=http://localhost islandora_batch_ingest
You may get a warning. "Failed to get issued date from MODS for dailyplanet:1"<br/>
After ingesting everything looks normal but the "issue" you ingested is missing.
Further documentation for this module is available at the Islandora Newspaper Batch github repository.