Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated links for tesseract's move to GitHub

...

Tesseract an OCR engine that was developed at HP Labs between 1985 and 1995 - it is currently managed by a team at Google; the latest stable release can be found on the downloads page of their websiteGitHubhttps://codegithub.google.com/p/tesseract-ocr/downloadstesseract/listreleases. A binary installer exists for Windows, and specific instructions for installing on a Mac through MacPorts can be found in the Tesseract readme here: https://codegithub.google.com/p/tesseract-ocr/tesseract/wiki/ReadMe. For Linux users, or any others compiling it from source, you will need to make sure that you also have the Leptonica library installed, and that you have appropriate source building tools.

...

Tesseract requires little configuration out of the box; that being said, Islandora supports the installation of multiple languages for OCR processing, and may even require English language support. These additional languages can be found on Tesseract's download page here.

To install additional languages into Islandora, you will need to know the path to your Tesseract installation's 'tessdata' folder. On Windows, this will tend to be C:\Program Files (x86)\Tesseract OCR\tessdata, if you've used the Tesseract website's own installation case. On Mac, any language can be installed with MacPorts by sudo port install tesseract-<langcode>. List of available langcodes can be found on MacPorts tesseract page.  On Linux, the path will vary from distribution to distribution, but will often be /usr/local/share/tessdata or /usr/share/tessdata. Once you have found the correct folder,

...