• Title (goal): Archive of digitised newspapers
  • Primary Actor: The State and University Library, Denmark
  • Scope: Organization (black-box)
  • Level: Very high level summary
  • (Story): 30 million pages of newspapers ingested over a 3 year period. Data consists of JPEG2 files externally referenced. Metadata consists of JPEG2 file analysis (using jpylizer), MODS metadata, MIX technical metadata and ALTO OCR files. Access is by daily harvesting of new and changed objects to dissemination system.

3 Comments

  1. Could you elaborate, Kåre Fiedler Christiansen, on the "dissemination system", and on what, if any, role Fedora plans in it. Is this OAI-PMH?

    1. Kåre Fiedler Christiansen have you had a chance to consider Andrew's comment above?

  2. Sorry for the much delayed reply.

    The system is currently based on Fedora 3, because we needed to have it implemented before Fedora 4 could go gold.

    Our dissemination system is based on an in-house SOLR workflow, where we bundle a few objects from Fedora (an object for the page, an object describing metadata for the newspaper edition this page is part of, and an object containing metadata about the newspaper title) to be one document in the SOLR index.

    We have written our own harvesting software based on queries to the resource index. It is very similar to the proai provider.