This page hopes to document the first attempt at mapping the National Library of Wales Newspaper content to PCDM and IIIF. A diagram of the Newspaper in PCDM is below:
Note:
- Title: is a Newspaper Title which has a record in our MARC catalogue.
- Phase: is a physical unit of handling (for digitisation) and is usually a set of issues physically bound together. We use this information to manage batches for digitisation and is not displayed to users.
- Issue: an issue of a Newspaper, this has a ISO issue date as metadata
- Article: a newspaper article, this can span Pages and can have multiple columns on one page. There are many articles on a single page.
- Page: a physical page of a Newspaper and also a container for Scanned image of a page.
- Archival Copy: Archival TIFF held in a HSM near line storage. Referenced over HTTP from Fedora
- JP2: reference version of the page currently stored as a managed datastream in Fedora
- ALTO: OCR Text, Article Boundaries and Coordinate information generated from the TIFF.
- We've added IIIF classes and relations into the above diagram to specify the OCR text annotations on a page and the related article metadata.
Questions:
- In the Portland Common Data Model it specifies the rdfs:label on the File can be repeatable. We could only thing of a file having a single Filename, is there an example where to labels for a File might be required.
- We haven't put it in the diagram but if a Manuscript had two orders one physical order (the order the physical material is in pre-scanning) and logical order (maybe the font covers have been moved from the back to the front). Would you have 1 object for the Manuscript and two member objects for each order?
- This is probably a IIIF question but we struggled to link the text of an article with the article object. We modeled a Newspaper Article to a range (as it can cross pages) but we couldn't see how we could add an annotation to a range as an anntotation seemed to be limited to a Canvas.
Comments welcome!