Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Here are the expanded comments I wrote from a point of ignorance on  
Fitz's scripts:

1.  html_to_csv.rb
- basically, the FTK html output has information that is needed by  
multiple steps in the ingest scripts, and putting that info into a csv  
format is easier for processing that continually parsing it out of the  
FTK html.

# Parses the Forensic Toolkit (FTK) Bookmark html files
#   and creates a csv file of their aggregation for importing into  
fedora.
#
# These bookmark files have technical metadata, as well as descriptive
#  metadata that follows a scheme set up in the FTK application by  
Peter Chan.

2.  reorg_directory.rb

# Takes the output from html_to_csv.rb (containing the aggregation
# of desired data from the FTK bookmark html files), and creates a  
populated
# directory structure easier for FEDORA to ingest.
#
# A directory is made for each distinct Fedora object (a file per FTK  
bookmark
#  "exportAs" information), containing the relevant FTK output files  
and the
#   Transit Solution HTML output files.

3.  convert_objects.rb
- this is the meat of the file preparation.

# Given the output of reorg_directory.rb, which is a parent directory  
containing
#  a (sub)directory for each Fedora object to be created (one per  
colleciton source file),
#  this script takes the Transit Solution HTML file within each Fedora  
object directory,
#   converts it to a postscript file,
#  then converts the postscript file into a PDF and multiple per-page  
jp2000 and text files
#  for ingest into fedora.   These files are output to the appropriate  
Fedora
#  object directory with appropriate file extensions.
#
# This script requires:
#    html2ps perl script and the sample profile  ( http://user.it.uu.se/~jan/html2ps.html 
  )
#    perl (for html2ps script)
#    ps2pdf to convert Postscripts to PDF (http://www.ps2pdf.com/)
#      ghostscript?
#    ImagicMagick (with jasper + jp2000 libraries installed - http://www.imagemagick.org 
)
#    pdftotext (from poppler-utils   http://poppler.freedesktop.org/)

Hi everyone,

I had to do some digging on through my backups, but I've resurrected the scripts I used to convert and ingest items into the AIMS application.
I've put the on github over here --> https://github.com/cfitz/aims_scripts

...