Date: Fri, 29 Mar 2024 09:42:21 -0400 (EDT) Message-ID: <168104075.71.1711719741752@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_70_953627451.1711719741752" ------=_Part_70_953627451.1711719741752 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Stanford is proposing two projects
As part of Stanford=E2=80=99s participation in the first Linked Data for= Libraries project, much effort was put into the conversion of its MARC dat= a to linked data, in this case, BIBFRAME. The predominance of a library=E2= =80=99s metadata is captured in MARC and its conversion to linked data will= be key to any successful shift in technology. This conversion process reve= aled a number of problem areas, however. MARC was intended not only to capt= ure metadata about a resource, but also metadata about the metadata (for in= stance, when the metadata was created). These different types of metadata a= re difficult to sort out in the conversion to linked data. Likewise, many r= elationships, such as the link between a subject heading and a specific wor= k, or the link between a performer and the particular work they performed, = is not captured in MARC. It is assumed that the person viewing the data at = a computer screen will make those connections. Because of this lack of inte= rnal connection within a MARC record, these relationships, and many others,= will not be expressed in linked data after its conversion. All of these mu= st be added by hand post conversion. It became clear, in order to avoid thi= s extra labor, libraries would need to begin creating their metadata as lin= ked data directly. This realization drove Stanford to choose the conversion= of its workflows as its choice for an LD4P Institutional project. Although= Stanford=E2=80=99s production workflows will be unique to that institution= , the elements in the production chain can be generalized and shared with o= ther members of LD4P. In addition, agreements for support with common vendo= rs such as Casalini Libri or Backstage will support all.
The individual elements in this new production workflow are tantalizingl= y familiar. Stanford has had a many-year history with the conversion of MAR= C data, both bibliographic and authority, to linked data. They have had exp= erience with a variety of triple-stores and the ingest of data. They have i= n-depth knowledge of RDF and extensive experience with identifiers. They ha= ve evaluated the Library of Congress=E2=80=99 linked-data production tools.= And the Library of Congress tools are sophisticated enough at this point t= hat LC will be training 40 catalogers to use them at that institution. But = even given the experience with the elements of this new workflow, concerted= effort will be needed to bind them together into a workflow that will func= tion in Stanford=E2=80=99s environment.
Building upon the roadmap developed by BIBFLOW and its analysis of needs= in a linked data environment, Stanford will focus these requirements on fo= ur key production pathways. Each pathway will be examined, from acquisition= to discovery, as a tracer bullet. All key elements in those workflows will= be converted to a process rooted in linked data but in a basic way. Emphas= is will be on the completeness of the pathway. The workflows themselves wil= l be expanded in future to account for additional complexities once the ini= tial pathway has been established.
Stanford will be creating a parallel LOD processing stream to accomplish= these goals. Resources flowing through these pathways first will be proces= sed in the traditional way with traditional MARC or MODS metadata. This tra= ditional metadata will be used for discovery purposes in the university=E2= =80=99s discovery environment and for contributions to cooperative catalogi= ng programs such as the Program for Cooperative Cataloging. A parallel, lin= ked-data workflow will be created, however, for LD4P and duplicative metada= ta created. This metadata will feed into a parallel discovery environment a= s well so that we can mimic the entire processing stream. This parallel met= adata can also be sent to various library vendors and programs so that they= can begin to adjust their businesses to incorporate linked data. Although = this proposed solution will require duplicative effort, it will allow Stanf= ord to experiment with an alternative pathway without being dependent on th= e results for discovery. It also has the benefit of testing the new pathway= with actual library resources and staff so that a true measure of effort a= nd cost to implement the new paradigm can be evaluated.
In preparing for this grant proposal, Stanford held a series of meetings= involving its Acquisitions Department, Metadata Department and Digital Lib= rary Systems and Services (DLSS). Key to the analysis was the mapping of it= s MARC workflows, MODS workflows, and original linked data workflows (inclu= ding data from vendors, self-created linked data using LC=E2=80=99s BIBFRAM= E Editor, etc.). The elements of Stanford=E2=80=99s workflows and how they = interact with these data types, data stores (Symphony (ILS), Digital Reposi= tory (SDR3)), and the triple store (Store Cache) were carefully mapped out.= An analysis was done of what tools and environmental changes were needed i= n order to allow the individual workflows to proceed as a tracer bullet and= the amount of effort needed to finish the work.
The resultant analysis led Stanford to choose four key workflows for con= version to linked data, two for traditional materials and two for digital: = copy cataloging through the Acquisitions Department, original cataloging, d= eposit of a single item into the Digital Repository, and the deposit of a c= ollection of resources into the Digital Repository. The tracer bullet will = follow the life cycle of a resource, from its acquisition to discovery. Eac= h process along the way will be converted to a linked data strategy. These = processes simply need to be good enough to support an experimental workflow= . Requirements for a full production workflow will be gathered iteratively = and passed on through regular meetings to LD4L-Labs for development. The de= velopment of this skeletal architecture, however, will still demand a sizea= ble amount of effort. Through workflow analysis, we have determined eight k= ey areas for initial development: implementation and enhancement of the LC = MARC2BIBFRAME converter, functional installation of the LC BIBFRAME Editor,= development of storage and caching mechanisms, development of a BIBFRAME b= ridge to the ILS, a BIBFRAME to Solr mapping for discovery in Blacklight, t= he publishing of the linked data output to the web, integration of the BIBF= RAME Editor to the Digital Repository, and development of the systems archi= tecture to answer such questions as database of record for resource metadat= a.
Currently, 80% of Stanford=E2=80=99s monographs come = with some form of MARC copy. The material is received by lower level parapr= ofessional staff in Acquisitions and cataloged on receipt. Because of the l= arge volume of materials, throughput must remain high. As the transition of= the entire library ecosystem to linked data will be slow, having at least = one flow that begins with MARC data will be inevitable. And because of its = volume, the transition of this copy-cataloging workflow will be essential. = The effort needed to convert this workflow has been analyzed. Of equal impo= rtance to the conversion of the workflow itself will be the broader questio= ns that the Project Co-Manager can explore based on data gathered as resour= ces are processed. The answers to these questions are not necessary for the= successful completion of this phase of LD4P, but will prove invaluable for= the planning of the next iteration of LD4P after these first two years are= completed. For example:
Stanford produces original cataloging in all formats (= books, serials, sound recordings, etc.) according to national standards for= its own internal needs and to share with the broader library community. Al= though needing to make use of many of the same tools as Pathway 1 (e.g., a = BIBFRAME Editor), as professional staff, the approach will be much differen= t. These staff will be creating new linked data directly, not converting MA= RC data and enhancing it. As in Pathway 1, of equal importance to the conve= rsion of the workflow itself, will be the broader questions that the Projec= t Co-Manager can explore based on data gathered as resources are processed.= For example:
Another major flow of metadata is for born digital objects deposited int= o Stanford=E2=80=99s digital repository and from there into the discovery e= nvironment. This metadata currently is stored and maintained in the MODS fo= rmat. Pathway 3 explores the self-deposit of a single digital resource into= the digital repository. Issues to be explored:
Stanford=E2=80=99s digital repository also hosts an increasing number of= large collections of digital objects. Metadata is often received in the fo= rm of a spreadsheet and is converted for deposit and typically enriched or = remediated afterwards. All processes must be automated as the collections a= re large. Issues to be explored:
conversion of a large collection of metadata to linked data&nb= sp;
automated remediation of the metadata either before or after p= rocessing
The Tracer Bullet projects will make use of common tools and environment= s as they proceed. A number of these tools, such as the BIBFRAME Editor, ha= ve already been developed and tested by the Library of Congress and will be= used by them in their LD4P projects. Other elements such as the triple sto= re or MARC conversion flow have been developed by LD4L and are already in p= lace. The Tracer Bullet Technologist will be responsible for integrating th= ese tools into a single flow within the local Stanford environment. As the = tools are used, optional enhancements will be uncovered that would improve = their functionality outside of the simple, tracer-bullet workflows.
The Performed Music Ontology Project is a collaborative effort of = Stanford University, the Music Library Association (MLA), the Association f= or Recorded Sound Collections (ARSC), the Library of Congress, and the PCC,= with participation of LD4P partner institutions. The project aims to devel= op a BIBFRAME-based ontology for performed music in all formats, with a par= ticular emphasis on clarifying and expanding on the modelling of works, eve= nts, and their contributors. Building on the work being completed by the Li= brary of Congress in early 2016 and using BIBFRAME as a core ontology, the = project members will collaborate to expand that core with domain-specific e= nhancements for use as a common standard by the library and archival commun= ities, and establish a model by which these extensions can be created, endo= rsed, and maintained by the community in the future. Along with the develop= ment of the new ontology, the project team will explore ways of enhancing e= xisting MARC records to make them more conversion-friendly to the newly dev= eloped ontology.
Linked data, including BIBFRAME, provides a major opportunity for =
describing performed music resources. The rich complex of associations in a=
nd among sound recording resources can be expressed through machine-linking=
of the data elements, and made available for further enhancement as linked=
open data on the Web. Without the restrictions imposed by the MARC format,=
more subtle relationships amongst performed music instances, the interrela=
tionships of musical groups and musicians, the relationships between works,=
all can potentially be better expressed in linked open data.
BIBFRAME, however, does not in itself provide a complete model for express=
ing all these relationships, its initial creation having been based strongl=
y on the contents of a MARC record, and thus inheriting some of its inheren=
t drawbacks for non-book resources. To best expose and exploit the relation=
ships in a performed music resource, the BIBFRAME ontology needs to be exte=
nded, either through extending the ontology from within, or by using pre-ex=
isting ontologies. BIBFRAME would thus be conceived of as a core ontology, =
to which additional domain-specific ontologies may be added. This approach =
is highly appealing, not only to communities interested in performed music,=
but also to other specialist communities, in that they have the opportunit=
y to better shape ontologies and vocabularies to respond to their particula=
r needs and domain outlook. Ideally, using BIBFRAME as a core ontology can =
allow for data exchange and compatibility amongst different domains, while =
still being tailored to the needs of each.
The risk with this model for BIBFRAME development is that without a cooper=
ative effort in development and maintenance, conflicting ontologies may be =
developed for the same domain and data exchange hindered. The library commu=
nity maintains itself by creating metadata to very specific standards that =
can be exchange with little or no additional effort, the overall load of wo=
rk being far too great for any one institution. To continue this successful=
collaboration in the linked data sphere, it is vital that communities come=
together to create enhancements and agree upon a common ontology, and esta=
blish a model by which these enhancements can be developed and maintained b=
y the community going forward.
The project will emphasize the refinement and extension of the BIB= FRAME 2.0 model as developed by the Library of Congress. Special emphasis w= ill be placed on the modelling of events, technical characteristics, perfor= mers, and the integration of the new ontology for medium of performance. Pr= oject team members are drawn from the partner institutions, and serve as pr= imary liaisons for the project; there will also be major contributions from= , and testing by, catalogers and selected working groups within each organi= zation. This collaboration of the primary stakeholders is vital in developi= ng a successful, sharable standard.