Sufia - documentation https://drive.google.com/drive/folders/1AvxAVFusdFBjgKU3HD64mcp8jcElelKc?usp=sharing
DSpace to Sufia - missing Collections - display set wg, came to be CE WG - Lynette picked up and Collections Extension is almost finished
Not just data but features are missing - some features are missing - UM wants to see what features are important and how we're doing data migration
Hoping to get folks who've migrated like SVT
Josh Gum from OSU - want to hear Josh's experience
|Who is done with migrations?|
Josh - repairing metadata and migrating it over to appropriate admin sets, for example - lot of DSPace db queries. get an idea of state of data - launching big effort. Discovering metadata issues before and after migration - querying DSpace now to check back in unocvering metadata quality issues
Benefits in how we migrated - hdl issued to DSPace - have that key in index - query for hdls - those are things that come from DSPace - no handle in Hyrax
Lot of work types 8 or 10 different types - 11 admin sets
A dozen admin sets
can have literally any combination - all relative to pre-work of db analysis, of type sof works in DSpace. Sub-collections - those mostly mapped to work types and admin sets we'd designed.
400 collections - many of them were the same - Biggest were ETDS. Easy to identify what could be processed as ETDs. Other works all needed to be evaluated. They belong in the Article Admin set (ex).
Faculty and regular articles, thesis for projects, technical reports, etc... admin sets for reports - scripted a process to export bags for particular collections - kept track of what had been migrated.
60,000 items migrated. Took a long time. Fixing was quick. Migration script would run during biz hours, shepherd through the process - background workers would make derivatives and would catch up. ~2000/ day was the max. Single worker process with sidekick processing background jobs.
Does Hyrax have everything DSPace had?
Nabeela: We should look at Hyrax and consider Gap Analysis
Josh: DSpace may have been old at OSU.
COllectuons and nested collected - managing admin sets as COllections
Question: Do we feel Hyrax has all the DSpace features -
Hui: Not 100%. Basic features and kind of advanced features. Basic, the same. DSpace has a longer history so community developed features like Batch Import/ Export. In DSpace can update metadata via .csv. OAI-PMH. In DSpace. Also bitstream - harvesting content over internet. DSpace 5 or 6, creating Research Profile Features - integrate with AuthorID/ ORCiD. It's different priorities that have been used widely that aren't yet in Hyrax. Created some work-arounds to migrate to Hyrax, not a loss of functions. Compared to 3, not a fair comparison.
OAI, REST API or SWORD - not used
In DSPace - SWORD-based ingestion form. Use DOI to find item. DSPace has SWORD running in parallel to tomcat. Locally - developed an OAI-PMH branch for Hyrax - support some OAI-PMH for Oregon Dig Lib.
Try to do the same thing, but required some dev.
Restrictions: IP restrictions -
Customized authentication code - possible to implement embargo
Meet as a group - look at challenges ?
Hyrax doesn't have collections extension stuff - Josh and Hui working on that transition now. Some custom code isn't behaving.
- Must have/ should have analysis
Have OSU analysis done by next week - looking at where they're at
Not all actors are doing the right thing.
with a new version - re-create collections they've lost - it's in metadata now. Original DSpace COmmunity, Collections and Handle with each work. But kicking can down the road for CE's work to finish to evaluate a broad cleanup or change. That metadata is hidden.
In DSpace you can adjust metadata by Collection - labels are same regardless of where you are - by default, but update YAML files to update gloss on titles. Can apply per work-type.
for this migration - look at schema and know your history - all the history is important - metadata or cataloguers will be looking at what we did
Ultimately a bridge to Hyrax - but if there's someway to incorporate this work in the information products - for IR use - Content DM to Hyrax - Hyrax doesn't live up to Cultural Heritage repos.
OSU - has things they'd do differently next time. And there's some meta-programming going on. A pretty straightforward task but does get into the weeds in some metaprogramming.
Schedule a walk-through
In scholars Archive - generate work-types via generator - abstracted meatdata into modules we could include. Properties themselves - multiple worktypes - some worktypes are pretty similar.