Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Drush instructions

...

The Islandora Newspaper Batch module uses the Islandora batch Batch framework to provide a command-line (drush) and a GUI (Drupal interface) option for adding a batch file of newspaper issues and pages to an existing Islandora Newspaper object. 

Batch-loading newspapers is a two-step process.

  1. Preprocessing: Drupal creates entries in the database for each object (issue and page) that will be added.
  2. Ingest: The data is ingested and derivatives are generated as part of the Islandora batch functions.

...

  1. .

Newspaper Batch uses the value in the MODS dateIssued field on each issue to populate the issue browsing display for newspaper. The data in this field must be formatted as YYYY-MM-DD. If only YYYY is entered, the interface will use the current month and day for the issue.

Creating a new Newspaper object

  • Newspaper Batch can only be used with an existing Newspaper object (islandora:newspaperCModel). To create a new Newspaper object:
    • Go to http://localhost:8000 and log in
    • Navigation > Islandora Repository 
    • Click on the Newspaper Collection
    • Click Manage tab 
    • Click Add an object to this Collection
    • Use the default content model Islandora Newspaper Content Model
    • If you have MARCXML to submit, select the file, upload it, and click Next. If you do not have MARCXML, just click Next (MARCXML is not required at this step).
    • If you do not want this option to appear again, disable the "Islandora MARCXML" module.
    • Title is the only "required" field at this stage
    • Click ingest and it should confirm your ingest
    Newspaper Batch uses the value in the MODS dateIssued field on each issue to populate the issue browsing display for newspaper. The data in this field must be formatted as YYYY-MM-DD. If only YYYY is entered, the interface will use the current month and day for the issue.

Preparing files for batch ingest

The Newspaper Batch module is designed for digitized newspapers where each page is represented by an individual TIFF image file. These TIFF files, along with derivatives, full text, and metadata, are must be arranged in directories that can be turned into a ZIP file for upload into Islandora using the Newspaper Batch functions.according to a very specific structure.  

Tips for preparing batch ingest files

  • Generally, Islandora performs best when each with ZIP directory is files smaller than 2 GB.. If your files are larger, consider using drush with the --type=directory option.
  • Within the zip file or target directory (if using drush and the --type-directory option), each top-level directory represents a newspaper Each directory within the ZIP file represents an issue.
  • Files within the issue directory will become datastreams on the issue object (e.g. this is where you put issue-level metadata).
  • Directories within the issue directory contain files that will become newspaper page objects.  These are usually named numerically, as they are processed in numerical order.
  • File names must match the their respective Islandora datastream IDs that match each file. This means that every page image (tiff) needs to be renamed "OBJ.tif" in order to be treated as a newspaper page object by Islandora.. If you have created derivatives for the page objects, these can be named respectively (e.g. TN.jpg, OCR.txt, ...)

Sample single-issue batch folder hierarchy

...

└── issue1
    ├── 001
    │   └── OBJ.tif
    ├── 002
    │   └── OBJ.tif
    └── MODS.xml - this becomes the MODS record for the issue-level object

 

...

Sample batch folder hierarchy with derivatives

batch02.zip
└── issue1
    ├── 1001
    │   ├── JP2.jp2
    │   ├── JPG.jpg
    │   ├── OBJ.tif
    │   ├── OCR.txt
    │   └── TN.jpg
    ├── 2
    │   ├── JP2.jp2
    │   ├── JPG.jpg
    │   ├── OBJ.tif
    │   ├── OCR.txt
    │   └── TN.jpg
    └── MODS.xml

...

  1. Log in as a user with batch ingest permissions.
  2. Navigate to a Newspaper Solution Pack Content Model object and click Manage.
  3. In the Overview tab, click Newspaper Batch.
  4. Upload the ZIP file and set the appropriate options for this batch, then click Ingest.

...

If you have many ZIP files to ingest, or if the ZIP files are too large to ingest through the interface, you can also batch ingest newspapers from the Drupal command line with Drush. 

First, your file(s) need to be accessible by the Drush instance. That usually means that they need to be uploaded to the Islandora server (scp, ftp, using a mounted storage drive, etc). The "target" (--scan_target in Drush 7 and above) is either a directory of issue directories, or a zip file of issue directories.

Second, preprocess the zip file(s).
( For a full list of the command-line parameters, see "drush help islandora_newspaper_batch_preprocess"):.  The batch options are also described in the Islandora Batch module. 

drush -v -u 1 --uri=http://localhost islandora_newspaper_batch_preprocess --type=directory --scan_target=/path/to/issues --namespace=dailyplanet --parent=islandora:dailyplanet

This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper title object, not an issue object or a collection object.

SecondThird, process all items in the batch queue:

...