Child pages
  • Samvera Newspapers Interest Group Call: 2017-11-09

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Moderator: Eben English (Boston Public Library)

Notetaker: TBD Eben English


  • your name here


  • Eben English (BPL)
  • Brian McBride (Utah)
  • Sean Upton (Utah)
  • Kalee Sprague (Yale)
  • Randall Floyd (Indiana)


  1. Ongoing work
    1. IUPUI / Indiana
      1. Andy Smith at IUPUI has done quite a bit of work on newspaper ingest
      2. App based on CurationConcerns and Princeton's Plum?
      3. Migrating content from CONTENTdm
      4. Have YAML format for describing structural metadata used for ingest
      5. Crosswalking between METS structural metadata and YAML format
    2. Yale
      1. Has CONTENTdm stuff, wants to migrate
      2. Article-level segmentation
      3. Can provide samples
      4. Eben will send Kalee a link to Google Drive folder where BPL-Utah grant project is collecting samples
  2. PCDM model review:
  3. Grant documentation review:
    1. NewspaperWorks Design Overview:
    2. NewspaperViews Design Overview:
  4. Ingest Scenarios:
  5. Ingest PDF which is a single issue (of single title)
  6. Ingest batch of PDFs which are issues (of single title)
    NDNP data
  7. Page level
  8. Article level
  9. Image files are pages
  10. Image files are articles
    Arbitrary files
  11. Ingest n master files (page image(s)) as single issue (of single title)
  12. Ingest batch of master files (page images) which are multiple issues (of single title)
  13. Ingest n master files (article images) which are a single article (of a single issue of a single title)
  14. Ingest batch of master files (article images) which are articles (of a single issue of a single title???)
    Controlled genre list for articles
  15. Other agenda items
    1. Proxy ordering for articles in issues
      1. Still unsure if this is needed.
    2. Files for articles
      1. Articles may have binary files representing the page image segment
    3. Generic vs. specific
      1. Question was raised about whether this model is too specific too newspaper content, might be daunting to some implementers
      2. Could potentially pursue a more generic 'paged media' model
      3. Do newspapers have any intrinsic features that are not shared with other common paged media objects such as books or magazines?
      4. Could call it 'Periodical' rather than 'Newspaper'?
      5. More thought needed.
      6. ArcLight may have some modeling that could be useful?
    4. Performance could be an issue
      1. Indiana ran into performance problems when objects had more than 200 pages or so.
      2. Valkyrie provides better performance.