Introduction

The Islandora Newspaper Batch module uses the Islandora batch framework to provide a command-line (drush) and GUI (Drupal interface) option for adding a batch file of newspaper issues and pages to an existing Islandora Newspaper object.

Batch-loading newspapers is a two-step process.

  1. Preprocessing: Drupal creates entries in the database for each object (issue and page) that will be added.
  2. Ingest: The data is ingested and derivatives are generated as part of the Islandora batch functions.

Getting started

Preparing files for batch ingest

The Newspaper Batch module is designed for digitized newspapers where each page is represented by an individual TIFF image file. These TIFF files, along with derivatives, full text, and metadata, are arranged in directories that can be turned into a ZIP file for upload into Islandora using the Newspaper Batch functions.

Tips for preparing batch ingest files

Sample single-issue batch folder hierarchy

batch.zip
└── issue1
    ├── 001
    │   └── OBJ.tif
    ├── 002
    │   └── OBJ.tif
    └── MODS.xml - this becomes the MODS record for the issue-level object

 

Other files, with file names corresponding to datastream IDs, can be included in each page subfolder, such as JP2.jp2, OCR.txt, and TN.jpg. If derivatives are included, Islandora will not generate new derivatives, which speeds up ingest.

Sample batch folder hierarchy with derivatives

batch02.zip
└── issue1
    ├── 1
    │   ├── JP2.jp2
    │   ├── JPG.jpeg
    │   ├── OBJ.tif
    │   ├── OCR.asc
    │   └── TN.jpeg
    ├── 2
    │   ├── JP2.jp2
    │   ├── JPG.jpeg
    │   ├── OBJ.tif
    │   ├── OCR.asc
    │   └── TN.jpeg
    └── MODS.xml

Descriptive Metadata

If MODS metadata is not available for issue or page objects, the following formats can be supplied and will be automatically transformed to general MODS and DC.

Other things to note about metadata:

Sample Issue-level MODS.xml file

Here is a sample MODS file describing a newspaper issue.

<?xml version="1.0" encoding="UTF-8"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
    <titleInfo>
      <title>Canadian Jewish Review, June 1, 1928</title>
    </titleInfo>
    <originInfo>
      <place>
        <placeTerm>Toronto, Ontario</placeTerm>
      </place>
      <publisher>Canadian Jewish Review </publisher>
      <dateIssued encoding="iso8601">1928-06-01</dateIssued>
    </originInfo>
    <language>
      <languageTerm>eng</languageTerm>
    </language>
    <subject>
      <topic>Jews, Canadian -- Ontario -- Toronto -- History -- Newspapers</topic>
      <topic>Jews, Canadian -- Quebec -- Montreal -- History -- Newspapers</topic>
      <topic>Jews -- History -- 20th century -- Newspapers</topic>
      <topic>Jews -- Canada -- Periodicals</topic>
      <topic>Canada -- History -- 20th century -- Newspapers</topic>
      <topic>Ontario -- History -- 20th century -- Newspapers</topic>
      <topic>Quebec -- History -- 20th century -- Newspapers</topic>
      <topic>Toronto (Ont.) -- History -- 20th century -- Newspapers</topic>
      <topic>Montreal (Que.) -- History -- 20th century -- Newspapers</topic>
    </subject>
    <identifier>Cjewish-1928-06-01</identifier>
</mods>

Using Newspaper Batch from the Drupal interface

To use Newspaper Batch in Islandora:

  1. Log in as a user with batch ingest permissions.
  2. Navigate to a Newspaper Solution Pack object and click Manage.
  3. In the Overview tab, click Newspaper Batch.
  4. Upload the ZIP file and set the appropriate options for this batch, then click Ingest.

Newspaper Batch Ingest options

 

Using Newspaper Batch from the command line (Drush)

If you have many ZIP files to ingest, or if the ZIP files are too large to ingest through the interface, you can also batch ingest newspapers from the Drupal command line with Drush.

To use the ZIP pre-processor from Drush:
(see drush help islandora_newspaper_batch_preprocess for additional parameters):

drush -v -u 1 --uri=http://localhost islandora_newspaper_batch_preprocess --type=directory --target=/path/to/issues --namespace=dailyplanet --parent=islandora:dailyplanet

This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper object, not a collection object.
You can then process all items in the batch queue:

drush -v -u 1 --uri=http://localhost islandora_batch_ingest

Troubleshooting

You may get a warning. "Failed to get issued date from MODS for dailyplanet:1"<br/>

After ingesting everything looks normal but the "issue" you ingested is missing. 

Additional Documentation

Further documentation for this module is available at the Islandora Newspaper Batch github repository.