This document walks through a sample migration using Islandora Workbench and spreadsheets from the Whitman College migration pilot.

General Workflow

Build a collection on Islandora 2.0 server to accept inputs. (From the browser interface)

An alternative would be to build the collection using the workbench tool. This means the first line of the spreadsheet would build the collection in a known location and subsequent lines in the spreadsheet use the unique identifier of the collection object (usually the PID) in the parent_id column, and the field_member_of column is left blank.

Build a config file with defaults to cover accessibility, authorship, access, etc.
Verify the input spreadsheet - make sure column headers have valid fieldnames, and build URLs from pids if necessary (an easy macro in the spreadsheet).
Dry run, then run. (Both from the command line.)
Check results, then accept, or rollback as necessary.
Add the spreadsheet to the input archive.

Simple Objects (e.g. Images)

Step 1: Provision a Local Environment

If you have not already done so, provision a local Islandora 2.0 environment to use for the sample ingests. We recommend using ISLE with the default install profile.

Step 2: Create a Collection in Islandora 2.0

Islandora Workbench works best when ingesting one collection at a time. To begin, login to Islandora 2.0 in your web browser and create a new collection called “Pacific Northwest Agricultural Photograph Collection”.

Step 3: Acquire The CSV

Islandora Workbench requires a csv in either Google Sheets or on your local disk. The AG_Photos spreadsheet is provided as a sample input_csv and can be upload to your Google Drive.

https://wiki.lyrasis.org/download/attachments/273351526/Newspapers1909-10.xlsx?api=v2

Step 4: Build a Config File

Islandora Workbench uses YAML files to configure its operations. These files are documented in detail already, so for the purposes of this sample ingest we will use this config file:


task: create
host: "https://islandora.traefik.me/"
username: xxxx
password: xxxx
media_type: file
input_csv: 'xxx'
id_field: PID
csv_field_templates:
 - field_rights: "http://rightsstatements.org/vocab/CNE/1.0/"
 - field_member_of: xxxx
 - field_model: xxxx
 - field_resource_type: xxxx
 - field_display_hints: xxxx
default_file_mimetype: 'image/tiff'
default_file_extension: ".tif"
use_node_title_for_media: 1
allow_adding_terms: true

The csv_field_templates are fields that will apply to every resource in the collection. The numbers referenced in these fields are Drupal Node IDs; you will need to update these numbers in your config file based on the Node IDs in your Drupal instance:

input_csv

The public link to your spreadsheet in Google Sheets

Note: If the gid of your spreadsheet does not automatically set to 0, you may need to set google_sheets_gid with the value from your spreadsheet. More information is available in the relevant workbench documentation .

field_member_of

This is the Node ID of the collection you created in step 2. You can find the ID by hovering over any of the tabs when you view the collection - it will be in the URL as “/node/id”.

field_model

The ID of the Islandora Model used by items in this collection. You can find a list of models and associated Node IDs by going to https://your.site/admin/structure/taxonomy/manage/islandora_models/overview. In this case, this is a collection of images, so we will go with the Image model.

field_resource_type

The ID of the resource type used by items in this collection. This is likely to be similar to the Islandora Model used above. You can find a list of resource types and associated IDs by going to https://your.site//admin/structure/taxonomy/manage/resource_types/overview. We will use the Image resource type for this collection.

field_display_hints

Display hints are used to indicate where a viewer should be used. You can find the list of display hints and associated IDs at https://your.site/admin/structure/taxonomy/manage/islandora_display/overview. These are large images so we’ll want to use the Open Seadragon viewer.

Step 5: Prepare Ingest Spreadsheets

For this exercise we’ll be using a sample collection of 100 photographs. You can make a copy of the spreadsheet to try this exercise locally.

List of Fields

Label	Machine Name	Field Type	Vocabulary
Identifier	field_identifier	Text(plain)
Title		Text(plain)
Description	field_description_long	Text(formatted, long)
Abstract	field_abstract	Text(formatted, long)
Date (EDTF)	field_edtf_date	EDTF
Date	field_date_display	Text(plain)
Date Created	field_edtf_date_created	EDTF
Subject	field_subject	Entity Reference	Subject
Geographic Subject	field_geographic_subject	Entity Reference	Geographic Subject
Genre	field_genre	Entity Reference	Genre
Extent	field_extent	Text(plain)
Source	field_source	Text(plain)
Language	field_language	Entity Reference	Language
Contact Us	field_rights_contact	Text(formatted, long)
Rights	field_rights	Link
Resource Type	field_resource_type	Entity Reference	Resource Type
Linked Agent	field_linked_agent	Typed Relation	Family, Corporate Body, Person

Each of these fields will need to exist in your Islandora 2.0 installation prior to running Workbench or the operation will fail. Any fields that don’t already exist can be created using the Drupal interface. The name of the field in the CSV must match the machine name of the field in Drupal. Each field has a corresponding type that will also need to be set. This will be covered in detail in the next step.

Step 6: Configure Drupal Fields and Taxonomies

Taxonomies will need to be created for fields that use the Entity Reference or Typed Relation field types. For demo purposes you can leave these fields unpopulated or add a few test terms; new terms will be added automatically during the ingest if they do not already exist.

We will create the following taxonomies for this exercise: Subject, Geographic Subject, Genre, Language, Corporate Body, Family, and Person. Resource Type should already exist by default. Each taxonomy can be created in the same way.

Go to Structure > Taxonomy to view existing vocabularies.
Click Add Vocabulary to create the Subject vocabulary (if it does not already exist)
Name it Subject and click Save
Click Add Term to populate the list, or leave it blank to be filled automatically during ingest
Follow the same steps to create the remaining vocabularies.

Once the vocabularies have been created you can proceed to create each field in the table above using the appropriate machine name, field type, and vocabulary. Follow these steps for each field:

Navigate to Structure > Content types > Repository item > Manage fields
Click Add Field
Select the Field Type based on the table above

Note: For Entity Reference field types you will need to select Taxonomy Term under Typed relation when setting the field type.

Add a Label based on the table
Save the Field settings
For Entity Reference and Typed Relation fields:

Check the “Create referenced entities if they don't already exist” box
Select the appropriate vocabulary (or vocabularies) based on the table

For Typed Relation Fields

You must populate the list of “Available Relators”. For the Linked Agent field, paste in the following list:

Note: RDF mappings are defined in a YAML configuration file. You can view and edit this file by going to /admin/config/development/configuration/single/export. Select RDF mapping for Configuration type and node.islandora_object for Configuration name. You can view the list of available RDF namespaces by going to /admin/config/search/jsonld.

Step 7: Check, Then Run

You should always check your configuration and spreadsheet are valid before running the ingest. Fortunately, Islandora Workbench makes this easy with the --check command:

./workbench --config config.yml --check

The check command will report out any errors so you can fix them before running the ingest.

Once no more errors are present, simply run the same command without --check:

./workbench --config config.yml

Complex Objects (e.g Newspapers)

Complex objects like newspapers and books follow a similar pattern, except parent and child relationships need to be established.

Step 1: Provision a Local Environment

If you have not already done so, provision a local Islandora 2.0 environment to use for the sample ingests. We recommend using ISLE with the default install profile.

Step 2: Create a Collection in Islandora 2.0

Islandora Workbench works best when ingesting one collection at a time. To begin, login to Islandora 2.0 in your web browser and create a new collection called “Whitman College Pioneer, 1896-11-01”.

Step 3: Acquire the CSV

Islandora Workbench requires a csv in either Google Sheets or on your local disk. The AG_Photos spreadsheet is provided as a sample input_csv and can be upload to your Google Drive.

Newspaper.xlsx

Step 4: Build a Config File

Islandora Workbench uses YAML files to configure its operations. These files are documented in detail already, so for the purposes of this sample ingest we will use this config file:

task: create
host: "https://islandora.traefik.me/"
username: xxxx
password: xxxx
input_csv: 'https://wiki.lyrasis.org/download/attachments/273351526/Newspaper_fixed.xlsx?version=1&modificationDate=1674548425416&api=v2'
id_field: id
csv_field_templates:
 - field_rights: "http://rightsstatements.org/vocab/CNE/1.0/"
 - field_display_hints: xxxx
use_node_title_for_media: 1
allow_adding_terms: true
list_missing_drupal_fields: true

The csv_field_templates are fields that will apply to every resource in the collection. The numbers referenced in these fields are Drupal Node IDs; you will need to update these numbers in your config file based on the Node IDs in your Drupal instance:

input_csv

The public link to your spreadsheet in Google Sheets

Note: If the gid of your spreadsheet does not automatically set to 0, you may need to set google_sheets_gid with the value from your spreadsheet. More information is available in the relevant workbench documentation .

field_display_hints

Display hints are used to indicate where a viewer should be used. You can find the list of display hints and associated IDs at https://your.site/admin/structure/taxonomy/manage/islandora_display/overview. These are large images so we’ll want to use the Open Seadragon viewer.

Step 5: Prepare Ingest Spreadsheets

For this exercise we’ll be using a sample collection of 100 newspaper issues and pages. You can make a copy of the spreadsheet to try this exercise locally.

The first two columns are ‘id’ and ‘parent_id’. Each item is created one at a time, so the first item (a newspaper issue) serves as the parent for the subsequent items (pages). Field_weight is used to set the order of the pages, and field_member_of is used to put the issues in the top-level collection.

List of Fields

Label	Machine Name	Field Type	Vocabulary
	parent_id
Weight	field_weight	Number(integer)
Member Of	field_member_of	Entity Reference
Model	field_model	Entity Reference
Identifier	field_identifier	Text(plain)
Title		Text(plain)
Description	field_description_long	Text(formatted, long)
Abstract	field_abstract	Text(formatted, long)
Date Issued	field_edtf_date_issued	EDTF
Date (EDTF)	field_edtf_date	EDTF
Date	field_date_display	Text(plain)
Date Created	field_edtf_date_created	EDTF
Volume	field_volume_num	Text(plain)
Issue	field_issue_num	Text(plain)
Subject	field_subject	Entity Reference	Subject
Geographic Subject	field_geographic_subject	Entity Reference	Geographic Subject
Genre	field_genre	Entity Reference	Genre
Extent	field_extent	Text(plain)
Source	field_source	Text(plain)
Language	field_language	Entity Reference	Language
Contact Us	field_rights_contact	Text(formatted, long)
Rights	field_rights	Link
Resource Type	field_resource_type	Entity Reference	Resource Type
Linked Agent	field_linked_agent	Typed Relation	Family, Corporate Body, Person

Each of these fields will need to exist in your Islandora 2.0 installation prior to running Workbench or the operation will fail. Any fields that don’t already exist can be created using the Drupal interface. The name of the field in the CSV must match the machine name of the field in Drupal. Each field has a corresponding type that will also need to be set. This will be covered in detail in the next step.

Step 6: Configure Drupal Fields and Taxonomies

Taxonomies will need to be created for fields that use the Entity Reference or Typed Relation field types. For demo purposes you can leave these fields unpopulated or add a few test terms; new terms will be added automatically during the ingest if they do not already exist.

We will create the following taxonomies for this exercise: Subject, Geographic Subject, Genre, Language, Corporate Body, Family, and Person. Resource Type should already exist by default. Each taxonomy can be created in the same way.

Go to Structure > Taxonomy to view existing vocabularies.
Click Add Vocabulary to create the Subject vocabulary (if it does not already exist)
Name it Subject and click Save
Click Add Term to populate the list, or leave it blank to be filled automatically during ingest
Follow the same steps to create the remaining vocabularies.

Once the vocabularies have been created you can proceed to create each field in the table above using the appropriate machine name, field type, and vocabulary. Follow these steps for each field:

Navigate to Structure > Content types > Repository item > Manage fields
Click Add Field
Select the Field Type based on the table above

Note: For Entity Reference field types you will need to select Taxonomy Term under Typed relation when setting the field type.

Add a Label based on the table
Save the Field settings
For Entity Reference and Typed Relation fields:

Check the “Create referenced entities if they don't already exist” box
Select the appropriate vocabulary (or vocabularies) based on the table

For Typed Relation Fields

You must populate the list of “Available Relators”. For the Linked Agent field, paste in the following list:

Note: RDF mappings are defined in a YAML configuration file. You can view and edit this file by going to /admin/config/development/configuration/single/export. Select RDF mapping for Configuration type and node.islandora_object for Configuration name. You can view the list of available RDF namespaces by going to /admin/config/search/jsonld.

Step 7: Check, Then Run

You should always check your configuration and spreadsheet are valid before running the ingest. Fortunately, Islandora Workbench makes this easy with the --check command:

./workbench --config config.yml --check

The check command will report out any errors so you can fix them before running the ingest.

Once no more errors are present, simply run the same command without --check:

./workbench --config config.yml

Page tree

Appendix - Islandora Workbench Sample Migration

General Workflow

Simple Objects (e.g. Images)

Step 1: Provision a Local Environment

Step 2: Create a Collection in Islandora 2.0

Step 3: Acquire The CSV

Step 4: Build a Config File

input_csv

field_member_of

field_model

field_resource_type

field_display_hints

Step 5: Prepare Ingest Spreadsheets

List of Fields

Step 6: Configure Drupal Fields and Taxonomies

Step 7: Check, Then Run

Complex Objects (e.g Newspapers)

Step 1: Provision a Local Environment

Step 2: Create a Collection in Islandora 2.0

Step 3: Acquire the CSV

Step 4: Build a Config File

input_csv

field_display_hints

Step 5: Prepare Ingest Spreadsheets

List of Fields

Step 6: Configure Drupal Fields and Taxonomies

Step 7: Check, Then Run