Page History
...
Panel | ||||||
---|---|---|---|---|---|---|
Contents
|
Introduction
In <cite>Automatic Automatic Format Identification using PRONOM and DROID</cite>DROID, Adrian Brown defines a "data format" as:<blockquote>The
The internal structure and encoding of a digital object,
...
which allows it to be processed, or to be rendered in human-accessible
...
form.
...
Note that this implies more than just knowing the common name of a Bitstream's format, e.g. "Adobe PDF". That name actually describes a family of formats. In order to know exactly how to recover the intelligence in a particular Bitstream, you'd want to know which specific version of PDF it is: later versions have features not found in earlier ones. The "internal structure and encoding" imposed by a data format is usually defined in exacting detail by a format specification document, and/or by the software applications that produce and consume that format.
...
This is the default algorithm that is implemented by FormatIdentifier.identifyFormat()
methods that simply call the FormatHit
's addToResults()
method on each hit they develop.
Info |
---|
...
NOTE: It is not necessary to use this algorithm. |
...
As described above, the format identification process is completely |
...
under the control of the |
...
- Start with an empty results list.
- Call the
FormatIdentifier.identifyFormat()
method of each plugin in the sequence in turn:- Passing it the Bitstream and list of accumulated results so it can add new results.
- If it has a better-confidence match than the current head of the list, that hit becomes the new head of the list.
- Otherwise the hit gets appended to the end of the list.
- When finished, the head of the list is the best format match.
...