Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Recode <cite> and <blockquote> for Confluence

...

Panel

Contents

Table of Contents
outlinetrue
stylenone

Introduction

In <cite>Automatic Automatic Format Identification using PRONOM and DROID</cite>DROID, Adrian Brown defines a "data format" as:<blockquote>The

The internal structure and encoding of a digital object,

...

which allows it to be processed, or to be rendered in human-accessible

...

form.

...

Note that this implies more than just knowing the common name of a Bitstream's format, e.g. "Adobe PDF". That name actually describes a family of formats. In order to know exactly how to recover the intelligence in a particular Bitstream, you'd want to know which specific version of PDF it is: later versions have features not found in earlier ones. The "internal structure and encoding" imposed by a data format is usually defined in exacting detail by a format specification document, and/or by the software applications that produce and consume that format.

...

This is the default algorithm that is implemented by FormatIdentifier.identifyFormat() methods that simply call the FormatHit's addToResults() method on each hit they develop.

Info

...

NOTE: It is not necessary to use this algorithm.

...

As described above, the format identification process is completely

...

under the control of the identifyFormat() method implementations.

...

  1. Start with an empty results list.
  2. Call the FormatIdentifier.identifyFormat() method of each plugin in the sequence in turn:
    • Passing it the Bitstream and list of accumulated results so it can add new results.
    • If it has a better-confidence match than the current head of the list, that hit becomes the new head of the list.
    • Otherwise the hit gets appended to the end of the list.
  3. When finished, the head of the list is the best format match.

...