Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Small wording changes

...

  • Generally, Islandora performs best with ZIP files smaller than 2 GB. If your files are larger, consider using drush with the --type=directory option.
  • Within the zip file or target directory (if using drush and the --type-directory option), each top-level directory represents a newspaper issue.
  • Files within the issue directory will become datastreams on the issue object (e.g. this is where you put issue-level metadata including a MODS file with the date of the issue).
  • Directories within the issue directory contain files that will become newspaper page objects. These are usually named numerically, as they are processed in numerical order.
  • File names must match their respective Islandora datastream IDs. This means that every page image (tiff) needs to be renamed "OBJ.tif". If you have created derivatives for the page objects, these can be named respectively (e.g. TN.jpg, OCR.txt, ...)

...

If you have many ZIP files to ingest, or if the ZIP files are too large to ingest through the interface, you can also batch ingest newspapers from the Drupal command line with Drush. 

First, your file(s) need to be accessible by the Drush instance. That usually means that they need to be uploaded to the Islandora server (scp, ftp, using a mounted storage drive, etc). The "target" --scan_target option (--scan_target option in Drush 7 6 and above) is either a directory of issue directories, or a zip file of issue directories. That is, the directories representing the issues to ingest (or issue, if only one) must be one level below the directory or zip file used as the --scan_target. 

Second, preprocess the file(s). For a full list of the command-line parameters, see "drush help islandora_newspaper_batch_preprocess".  The batch options are also described in the Islandora Batch module. 

...

This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper title object, not an issue object or a collection object.

Third, process all items in the batch queue:

drush -v -u 1 --uri=http://localhost islandora_batch_ingest

...