This documentation refers to an earlier version of Islandora. https://wiki.duraspace.org/display/ISLANDORA/Start is current.

Skip to end of metadata
Go to start of metadata

Overview

pdftotext is a utility that comes as part of the Foolabs Xpdf package. It is used by the PDF Solution Pack to extract text from text-based PDFs so that it can be appended to the object as a FULL_TEXT datastream.

Provisions

Downloads

pdftotext is installed as part of Xpdf, which can be found at Foolabs' official site, http://www.foolabs.com/xpdf/download.html. For Windows and Mac installations, a binary installer exists there; for Linux installations, however, you may compile it from source, use the binaries from the site, or much more simply use your distribution's package manager to install it automatically; on Debian- and Ubuntu-based systems, this can be accomplished by running:

apt-get install xpdf-utils

Usage

More information on how to integrate pdftotext with Islandora can be found on the PDF Solution Pack page.

  • No labels