Here 's are the note expanded comments I got from Fitz about his scripts (my starting point).wrote from a point of ignorance on
Fitz's scripts:
1. html_to_csv.rb
- basically, the FTK html output has information that is needed by
multiple steps in the ingest scripts, and putting that info into a csv
format is easier for processing that continually parsing it out of the
FTK html.
# Parses the Forensic Toolkit (FTK) Bookmark html files
# and creates a csv file of their aggregation for importing into
fedora.
#
# These bookmark files have technical metadata, as well as descriptive
# metadata that follows a scheme set up in the FTK application by
Peter Chan.
2. reorg_directory.rb
# Takes the output from html_to_csv.rb (containing the aggregation
# of desired data from the FTK bookmark html files), and creates a
populated
# directory structure easier for FEDORA to ingest.
#
# A directory is made for each distinct Fedora object (a file per FTK
bookmark
# "exportAs" information), containing the relevant FTK output files
and the
# Transit Solution HTML output files.
3. convert_objects.rb
- this is the meat of the file preparation.
# Given the output of reorg_directory.rb, which is a parent directory
containing
# a (sub)directory for each Fedora object to be created (one per
colleciton source file),
# this script takes the Transit Solution HTML file within each Fedora
object directory,
# converts it to a postscript file,
# then converts the postscript file into a PDF and multiple per-page
jp2000 and text files
# for ingest into fedora. These files are output to the appropriate
Fedora
# object directory with appropriate file extensions.
#
# This script requires:
# html2ps perl script and the sample profile ( http://user.it.uu.se/~jan/html2ps.html
)
# perl (for html2ps script)
# ps2pdf to convert Postscripts to PDF (http://www.ps2pdf.com/)
# ghostscript?
# ImagicMagick (with jasper + jp2000 libraries installed - http://www.imagemagick.org
)
# pdftotext (from poppler-utils http://poppler.freedesktop.org/)
Hi everyone,
I had to do some digging on through my backups, but I've resurrected the scripts I used to convert and ingest items into the AIMS application.
I've put the on github over here --> https://github.com/cfitz/aims_scripts
...