Date: Thu, 28 Mar 2024 05:57:41 -0400 (EDT) Message-ID: <144565598.27368.1711619861043@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_27367_1564814422.1711619861043" ------=_Part_27367_1564814422.1711619861043 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
By inserting comments into this comment-friendly version.
ARKs (Archival Resource Keys) are high-functioning identifiers = that lead you to things and to descriptions of those things. For example, t= his ARK,
<=
a class=3D"external-link" href=3D"https://n2t.net/ark:/67531/metadc107835/"=
rel=3D"nofollow">https://n2t.net/ark:/67531/metadc107835/
gets you to a dissertation, and adding a '?' on the end of the ARK shoul= d get you to its description:
https://n2t.net/ark:/67531/=
metadc107835/?
On the internet, an identifier is a URL, or part of a URL. For = example, this core ARK identifier,
&=
nbsp; ark:/12148/btv1b8449691v=
/f29
appears inside two different URLs (Uniform Resource Locators, also known= as web links or web addresses):
https://gallica.bnf.fr/ark:/12148/btv1b8449691v/f29=
&n=
bsp;https://n2t.net/ark:/12148/btv1b8449691v=
/f29
ARKs are especially good at being= persistent identifiers.
The average lifetime of a URL was once said to be 44 days. At the end of= its life, a URL link breaks, meaning it gives you the dreaded "40= 4 Not Found" error that most of us have seen. Irritating as that may be, it= 's politically awkward when looking for publicly funded research, and it's = a cultural disaster for libraries, archives, museums, and other memory orga= nizations.
A persistent identifier is a link that in principle k= eeps working far into the future, even as things move between websites. Nor= mally when things move, everyone who ever recorded the old links would need= to be told what the new links are, which is next to impossible. That's whe= re identifier resolvers come in.
A resolver is a website that specializes in forwarding inc= oming identifiers (the ones originally advertised to users) to whichever we= bsites are currently best able to deal with them. Overall, forwarding is ca= lled resolution; one step in a resolution process is cal= led redirection.
For a resolver to work, its hostname must be carefully chosen so it won'= t ever need to be changed. Memory organizations, some of them centuries old= , tend to have hostnames well-suited to be resolvers. Some well-known, youn= ger resolvers are n2t.net (the ARK resolver), identifiers.org, doi.org,= ha= ndle.net, and purl.org.
For anything and everything. Current uses of ARKs include
The only= prerequisite is to fill out an online request for a NAAN on b= ehalf of your organization. There is no charge to obtain a NAAN and all mem= ory organizations are welcome. Within a day or two you should receive an em= ail containing a NAAN for your organization's exclusive use. Meanwhile cons= ider the following.
There's a partial list of softwar= e tools for persistent identification that includes
There are also some vendors, such as ezid.cdlib.org, and some more information on concepts and best practices.
It's lik= e serving ordinary URLs. Incoming URL strings get mapped to content that yo= u return, and if your resolver redirects ARKs to those URLs, you're all set= . If you're dealing directly with incoming ARK strings, you can map (conver= t) them to a form your server handles (eg, map them to URLs on arrival). In= this second case, your server is acting as a local resolver.=
If you c= hoose to run your own ARK infrastructure, you get complete autonomy at the = expense of maintaining a server/resolver. On the one hand, you might run al= l custom infrastructure =E2=80=93 including content management, web hosting= , minting (generating unique identifier strings), and running your= own server/resolver. That infrastructure could be very simple, such as ser= ver configured to map incoming ARK-based URLs to server file pathnames. Whe= n you request your NAAN you will be asked to supply the base URL of your lo= cal server or resolver.
At the o= ther extreme, you might work with a vendor that supplies all the infrastruc= ture so that, for example, you can focus on creating content. Hybrid soluti= ons are also common, such as just taking your current web server arrangemen= t and just adding an identifier management piece (eg, the API/UI provided b= y ezid.cdlib.org, which partners with n2t.net).
You will also want to think about whether to advertise (rele= ase, publish, disseminate) your ARKs based at your resolver or at n2t.net. Yo= u might choose the former for branding or the latter for stability. Resolvi= ng your ARKs through n2t.net is always possible, regardless of how you adverti= se them (this is a side-effect of obtaining a NAAN).
NAAN stands for Name Assigning Authority Number, which is a unique 5-dig= it number that begins (after the "ark:" label) every ARK. The NAAN identifi= es the organization that assigned the ARK and ensures that two di= fferent organizations can never assign the same ARK. If your NAAN were 1234= 5 and your resolver were my.example.org, your ARKs would all start with
&=
nbsp; https://my.example.org/ark:/12345/...
=
p>
You may request a NAAN by filling out an an online form. The NAAN you obtain = will be listed alongside all other NAANs in the public NAAN registry, which you are free to browse through. Use= that same form to update your registry entry, for example, if you make a c= hange to the URL of your resolver, or if you have negotiated with another o= rganization to carry on your work and take over your NAAN. If you transitio= n into or out of a vendor relationship, there is no problem taking your NAA= N with you.
NAANs subdivide the set of all possible ARKs (the ARK namespace= ). The subset of ARKs under a given NAAN can be further subdivided into&nbs= p;shoulders (eg, 12345/x2, 98765/b4), which= can make it easy to delegate autonomous ARK assignment to departments in a= large organization. ARK resolution is loosely based on NAANs, but because = organizations split, ARKs accommodate the namespace splitt= ing problem by supporting management of a namespace by more than one or= ganization.
These are the major persistent identifier types (or schemes). They = have all been around since 2001 and they have much in common, starting with= structure.
https://n2t.net/ark:=
/99999/12345
https://doi.o=
rg/10.99999/12345
https://handle.net/10.99999/12345
https://purl.org/99999/12345
https://<various>/urn:99999:12345
ARKs, DOIs, and Handles are all found= in places like the Data Citation Index =E2=84=A0 and ORCID.org profil= es. As seen in these examples, they all have three parts:<= /p>
https://
) plus a hostname,99999
, 10.=
99999
, or p=
url.org/99999
), which is the organization that created a=
particular identifier,12345
). And they all have little effect on persistence. See 10 persistent myths about persistent identifiers.
No, that's too strong a statement. But let's keep these identifier schem= es (types) in perspective.
Given how little the schemes do for you, when choosing one you'll likely= want to consider factors such as cost, risk, and openness.
The short answer is that ARKs are the only mainstream, non-siloed, non-p= aywalled identifiers that you can register to use in about 48 hours. DOIs, = Handles, and PURLs require resolution and other services to come from their= respective centralized systems (silos).
That's not to say that persistence is free. Making any identifier persis= tent burdens you, the provider, with the costs of content management, hosti= ng, monitoring, and forwarding. You can do those things yourself or with he= lp from a vendor. But with ARKs, just as with URLs, you will not be charged= separately for your identifiers and you will not be locked in to a special= -purpose resolution silo that also locks out other identifiers.
ARKs are unusual in being decentralized. While one can get reso= lution services from a global ARK resolver called n2t.net, over 90% of the ARK= s in the world are published without reference to it. More than 500 registe= red organizations across the world have created an estimated 3.2 billion AR= Ks, and, as with URLs, no one has ever paid an identifier fee to create the= m. Of course maintaining them isn't free. It is never without cost= to keep content access persistent in the long term, regardless of identifi= er type.
Here are some more differences between ARKs, DOIs, Handles, PURLs, and U= RNs.
https://"
protocol. When that first part of the identifie=
r ceases to have meaning, only ARKs and URNs will include the label (eg, "a=
rk:") indicating the type of identifier that remains.ARKs have some unique features that support early object development: ARKs can be deleted, c= an be born with no metadata, and can exist with any metadata you care to st= ore.
Being able to delete identifiers actually makes ARKs more trustworthy. T= he ability to delete is a vital part of healthy collection management that = is denied to those non-ARK identifier types prohibiting deletion under the = presumption that people, once they are asked to commit, won't make mistakes= .
People armed with software regularly turn simple human errors into big t= angles of systematic mistakes, even at the threshold of commitment. By maki= ng it difficult to clean them up, we force systems to drag those messes for= ward in perpetuity.
While not immune to such mistakes, ARKs have the big advantage that they= can be created and deleted in the shadows, independent of release, publica= tion, or archival commitment.
Yes. Sometimes having two identifiers is useful, although it can become = confusing when it happens often. Many people start by assigning ARKs to eac= h thing they create in order to have a stable reference right from the begi= nning, even before they know whether they want to publish it, let alone kee= p it. Starting with an ARK, you benefit from being able to keep the origina= l identifier from birth through to public release as the object and its met= adata matures. For the subset of things that you end up wanting to publish = in places that require DOIs, you can assign DOIs at publication time. This = is a way in which ARKs support early object development.
In such a scenario, to reduce the burden of maintaining both identifiers= you could register the DOI to redirect to the ARK. = At the cost of maintaining just one identifier (the ARK), this= would keep newly published links and links previously stored and bookmarke= d by your collaborators from breaking.
There are no simple answers. Identifiers (not things, but their names) a= re tricky to talk about, so if you hear simple answers elsewhere, beware of common fallacies.
Nothing inherent in ARKs, DOIs, Handles, PURLs, or URNs makes them more = or less fit for any particular field, domain, or sector. With an identifier= resolver and administrative management service, they all provide the core = service of resolution (and so do properly managed URLs= ).
Generalizations about identifier types sometimes apply when resolution a= nd management for that type is locked into one particular vendor or provide= r. For example, many PURL and Handle features and restrictions are well-def= ined by their respective administration silos, as are those of DOIs, which = are built on top of Handles. But DOIs have metadata practices that are dive= rse and evolving across different DOI registration agencies.
The concrete differences that we experience, such as metadata= em>, landing pages, and tool integration (eg, publishing tools), are not pr= operties of identifier schemes per se, but properties of resolution, manage= ment, and citation services that various providers extend to or withhold fr= om different identifier types. Those services are shaped in turn by communi= ties of practice and by markets. Basic services are founded on a relia= ble database storing each identifier along with metadata elements (creator,= title, date, redirection URL, etc) that describe the identified object. Ex= tra services include link checking, duplicate detection, report generation,= and searching.
As of 2019, purely on an incomplete and anecdotal level, here are a few = trends that have been observed.
Those are special kinds of persistent identifiers. ORCIDs (Open Research= er and Contributor Identifiers) only identify researchers, and they link to= research works using ARKs, DOIs, etc. ORCIDs look like
https://orcid.org/0000-0001-76=
04-8041
ROR (Research Organization Registry) identifiers designate organizations= . For example, here's the California Digital Library:
UUIDs are globally unique, 37-character strings that are easy = for software to generate but only become usable as web addresses when made = part of a URL, for example, in this ARK:
https://somehost.example.com/3c2e39526-e0c3-41ae-be4f-0=
7558a9458eb
While embedding a UUID in an ordinary URL makes it actionable (= "clickable"), you could expect more if it were embedded in an ARK such= as
https://n2t.net/ark:/65665/3c2e39526-e0c3-41ae-be4f-07558=
a9458eb
As an ARK, for example, that UUID should return metadata (if available) = and be insensitive to the hyphens, making this form equally viable:
https:/=
/n2t.net/ark:/65665/3c2e39526e0c341aebe4f07558a9458eb
When in my workflow should I create ARKs?
At object birth, or even before. We name our babies before they're born,= and we name and refer to objects in the conception stages, sometimes long = before they bear fruit. Depending on how elaborate the planning may be, you= r unborn objects could have full-function ARKs that resolve to an appropria= te surrogate and return rich metadata, including = persistence statements.
The only caveat is to be careful releasing (advertising) ARKs that have = uncertain long-term prospects. Some identifier management systems have feat= ures to help manage and resolve unreleased identifiers (eg, EZID has a = "reserved" status). The more people who know about an ARK, the harder it is= to delete.
How is it that ARKs can be easy to delete?
If no one knows about an identifier but you, there's no harm in deleting=
or withdrawing it. Stepping back, an identifier is actually an assertion t=
hat a given string of characters is associated with specific thing. The fewer people you tell, the easier =
it is to scrap that assertion. If you create a URL and share it only with y=
our closest colleagues, that is much easier to withdraw than if the URL app=
eared for a month on a public website, from which it was harvested by inter=
net search engines. In contrast, it is hard to delete DOIs and Handles beca=
use once registered and made resolvable, they are effectively released to t=
he world.
ARKs behave like URLs in this res= pect. Providers are free to create and share ARKs narrowly, in which case t= hey're easy to delete.
Perhaps surprisingly, even if sha= red more broadly, ARKs can come with persistence statements<= /a> that tell you how much or how little commitment is made to them. ARKs w= ere designed to articulate a variety of persistence statements, but they ar= e certainly not alone among identifiers and objects that exhibit a variety = of commitment "flavors". This is why ARKs are more accurately known as high= -functioning rather than persistent identifiers.
Finally, people make mistakes.&nb= sp;ARKs, DOIs, Handles, PURLs, and URNs are sometimes bro= adcast in error and need to be withdrawn. When that happens, provider best = practice is make the withdrawn identifier resolve to a page that explains a= nd perhaps apologizes for the inconvenience. Despite the rumors, persistent= identifiers are never guaranteed.
People need identifiers before they know exactly what object they refer = to, or if they refer to anything worth keeping. An identifier that requires= mature metadata cannot be created during early development since little is= known about the object. So object creators almost always initially assign = identifiers that have no metadata requirements, such as URLs or ARKs.
If you start with an ARK, you benefit from being able to keep the origin= al identifier through to public release as the metadata matures. Many objects go through intensive develop= ment and revision phases, sometimes lasting years, during which they are to= o immature to meet most metadata requirements. Nonetheless every object nee= ds some sort of identifier from conception to maturity, where maturity coul= d look like public release and further enhancement, or abandonment. It is e= asy to abandon ARKs that have not been released into the world.
Like the object itself, metadata = elements need a flexible place to grow and mature over time:= span>
Unlike Crossref and DataCite DOIs, which require specific metadata (eg, = see the DataCite schema), ARKs do not const= rain any of these activities. Moreover the N2T.net resolver actually support= s all of them.
Creating metadata (extra information associated with or describing= an object) has several key benefits. First, no matter what the ARK re= directs to =E2=80=93 whether a landing page or a file =E2=80=93 m= etadata gives users vital information about the object, such as references = to newer versions, creation date, provenance, etc. For ARKs typically metad= ata is accessed via inflections.
Metadata also eases some persistence pain. By themselves, persiste= nt identifier strings are often opaque, revealing little abou= t what they identify (because non-opaque identifiers do not age or travel w= ell). But opaque identifiers are difficult because they give you no clues a= s to what the identifiers were meant to identify. In the absence of metadat= a you are forced to access the object itself to remind yourself what it is,= and to trust that it's the correct object. Metadata really he= lps. Moreover, discrepancies between returned metadata and the accessed obj= ect help everyone detect identifier changes and errors.
Metadata is for grownups, and is far less important for immature objects= and their identifiers than for those that have been released. Metadata demonstrates basic provider credibility= and commitment to high-functioning identifiers. Not every provider is up t= o this task.
It need not be expensive. Building metadata from scratch can be costly, = but it's usually created and managed by object providers, in which case it = can be leveraged efficiently for identifiers. Ideally, for strong persisten= ce master metadata (maintained by object providers) should be reflected in = independent systems so that it is hard for someone to tamper undetectably w= ith identifier associations. For example, digital object repositories that = obtain ARKs and DOIs from the EZID service store a copy of t= heir metadata with EZID.cdlib.org, which in turn stores another copy with t= he N2T= .net resolver.
Metadata is messy business for all identifiers, not just ARKs. Across do= mains and object types there are thousands of standards, many of them overl= apping yet conflicting, and each is applied according to local organization= al customs and with varying levels of compliance. Choosing or creating a sp= ecification for your metadata depends on factors such as
Reliable cross-domain interoperat= ion may remain out of reach, but Dublin Core, Data= Cite, Schema.org JSON-LD, and Dublin Kern= el are common metadata specifications to consider for use with ARK= s.
An ARK typically has a four-element kernel&= nbsp;of highly generic metadata, followed by any other metadata elements (n= ame/value pairs) the provider wishes to provide. Since 2001 ARKs were meant= to be interoperably indexable across the kernel elements for all object ty= pes digital, physical, and abst= ract. This was an unusually generic descriptive goal, shared by the seminal= metadata standard, Dublin Core (DC), which nonetheless underlies most= non-generic (domain-specific) metadata standards.
Kernel metadata is structured as =
if in answer to
There's much more to say about ARK metadata (available soon at arks.org)= and too much to cover in a basic FAQ. Other elements are key, such as = ;
An inflection is a change to the ending of a word to express a shift in meaning. It permit= s us to define a word such as "go" without also defining "goes" and "going"= . To an ARK that leads to an object, simply adding a '?' to the end (an exa= mple of an ARK inflection) permits us to request metadata without having to= define a separate identifier for the object's metadata. This simple techni= que can be used by a human with a web browser. The N2T resolver supports bo= th inflections and content negotiation.
Content negotiation for metadata is a kludgy software tech= nique for requesting alternate formats of an object, such as the PDF or RTF= form of an HTML file. Although not designed for it, historic "content nego= tiation" was contorted in certain contexts to request metadata under the st= artling assumption that formats often used to hold metadata are in= fact metadata and will never be objects in their own right. Unlike inflect= ions, "content negotiation for metadata" doesn't work at all for object= s represented in those formats (the list of which is growing and known= only by private agreement), nor is it easy enough to be used directly by m= ost human users.
Although inflections are commonly associated with ARKs, they are no= t "owned" by ARKs. Contrary to popular belief, identifiers don't do anything =E2=80=93 it's their resolvers that do or don'= t support such features. So, for example, inflections and suffix passthrough are supported by n2t.net&n= bsp;for all identifier types, but not by doi.org or handle.net for any id= entifier types.
Most ARKs are created by organizations that advertise ("publish") them b= ased at their own resolvers. For example, this ARK was published based= at the gallica.bnf.fr resolver:
https://gallica.bnf.fr/ark:/12148/btv1b8449691v/f29=
Having to run and maintain your own resolver is the cost of complete aut= onomy. Using your own resolver also lets you do branding via the hostname, = the downside being that brands are transient and tend to make identifiers f= ragile. Political and even legal (eg, trademarks) pressures may make suppor= ting older branded hostnames, hence their identifiers, difficult.
That's another reason for having the global ARK resolver. People coming = across a broken identifier in the future may find its hostname n= o longer exists, and if it's an ARK they can extract the core identity (sta= rting with "ark:") and present it to the global n2t.net resolver, as in=
&n=
bsp;https://n2t.net/ark:/12148/btv1b8449691v=
/f29
To avoid the risk of future inconvenience, an organization =E2=80=
=93 even one that runs its own resolver =E2=80=93 may choose
When demand for a global ARK resolver arose, basic principles of opennes= s and generality prevented the designers from creating yet another silo in = the DOI/Handle/PURL mold. Instead, the ARK resolver was built to be a = generic, scheme-agnostic resolver called N2T (Name-to-Thing), which now res= olves over 600 types of identifier, including ARKs, = DOIs, Handles, PURLs, URNs, ORCIDs, ISSNs, etc. Resolution is essenti= ally looking in a table for an identifier string, regardless of type, = and redirecting it to the right place.
The same basic principles guided the design of an earlier tool called&nb= sp;noid, which was built for ARKs but is also= regularly used by organizations that mint Handles.
Typically, scheme-based services are designed as silos, or closed platforms, serving a particular identi= fier type such as Handle, DOI, or PURL. Each silo performs the same main&nb= sp;functions =E2=80=93 mapping names (identifiers strings) to things (objec= ts or metadata). Excluding all but one type of identifier string may help t= o capture markets, but it's wasteful and non-inclusive. It requires buildin= g the same set of services over and over for each type and violates basic p= rinciples of openness.
In contrast the N2T (Name-to-Thing) resolver and EZID (identifi= ers made easy) management interface were designed to work with all= identifiers. Effort put into a= ny new feature can be efficiently leveraged across all types, which sometim= es creates surprising flexibility. For example, ARKs are often stored in EZ= ID with "DOI metadata", and every DOI stored in N2T can benefit from "ARK r= esolution features" such as inflections and suffix passthrough, which are not available via the main DOI resolver (<= /span>doi.org).