Current Release
This documentation covers the latest release of Islandora 7.x. For the very latest in Islandora, we recommend Islandora 8.

Overview

The PDF Solution Pack module adds functionality to Islandora for ingesting and viewing PDF files. It uses the ImageMagick library and module to create derivative thumbnail and preview images. Because of the text-based nature of PDF files, it can also be used to create or append easily searchable text datastreams to the object, which can later be configured through Solr to appear in searches.

Dependencies

  • Islandora

  • Tuque

  • ImageMagick is required to create derivatives. (Debian/Ubuntu sudo apt-get install imagemagick)

  • pdftotext is required to automatically create a FULL_TEXT data stream. (Debian/Ubuntu sudo apt-get install poppler-utils)

  • ghostscript  (Debian/Ubuntu sudo apt-get install ghostscript)
  • ImageMagick Drupal module
    • ensure that the full path to Imagemagick's convert is specified in the Image Toolkit (admin/config/media/image-toolkit)

Downloads

Release Notes and Downloads

Configuration

The configuration options for the PDF Solution Pack module can be found at http://path.to.your.site/admin/islandora/solution_pack_config/pdf, and include the following:

Text

Users can either upload a text file of their own, or allow Islandora to extract one from the PDF. Text accompanying the PDF is stored as the FULL_TEXT datastream. If both options are checked under the Text configuration section, and a valid path to pdftotext is entered, preference will be given to a supplied text file on ingest.

  • Allow users to upload text file with PDFs: This file must be plain text stored in .txt format.
  • Extract text streams from PDFs using pdftotext: Checking this box, will display an option to enter a path to the pdftotext executable. This package is not native to most server setups and will need to be installed manually for this option to be functional. Check the pdftotext dependency page for more information.
  • Create a PDF/A version of any uploaded PDF. PDF/A is a restrictive standard that prohibits more easily broken components of the PDF spec, such as fillable forms and DRM. The PDF/A derivative will not be used for display. Requires ghostscript to be installed on the server.
  • Use dUseCIEColor when generating PDFA datastream: whether the dUseCIEColor switch should be used for GhostScript when creating a PDF/A version. Not recommended for GhostScript versions 9.11 or higher.

Thumbnail and Preview

These options set the width, height, and colorspace parameters that will be sent to ImageMagick when generating Thumbnail and Preview derivatives for the PDF. ImageMagick will attempt to create these using the first page of the document. Changing the width or height will change the image size, but not the aspect ratio, of the derivatives being created.


Viewers

The PDF Solution Pack can utilize the PDF.js viewer to display PDF documents inline. To enable, navigate to the PDF Solution Pack's configuration page (admin/islandora/solution_pack_config/pdf) and select the PDF.js as the viewer.

Content Models, Prescribed Datastreams and Forms

The PDF Solution Pack comes with the following objects in http://path.to.your.site/admin/islandora/solution_pack_config/solution_packs:

  • Islandora PDF Content Model (islandora:sp_pdf)
  • PDF Collection (islandora:sp_pdf_collection)

An object created using the PDF Solution Pack's content model will have the following datastreams:

RELS-EXT

Default Fedora relationship metadata

MODS

MODS metadata record created during ingest

DC

Dublin Core record

OBJOriginal PDF file uploaded

TN

Thumbnail image created by ImageMagick during ingest

PREVIEWPreview image created by ImageMagick during ingest
FULL_TEXTOptional datastream either uploaded during ingest, or created by the pdftotext executable
PDFAOptional archival datastream created by the ghostscript executable

The PDF Solution Pack comes with the PDF MODS Form.

  • No labels