Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 50-year-old assumption that identifiers are case-insensitive. You might also consider using the character repertoire of the Noid tool, which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm; it uses only digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (eg, innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's 131533174, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. A common approach is to map the ARK to the URL using a software table that you update whenever the URL changes. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

 ARK ANATOMY                  Core Immutable Identity  
                         ________________________________
                        /                                \  
       Resolver Service   Base Object Name    Qualifiers
     __________________  _________________  _____________
    /                  \/            ...     \/             \
    https://example.org/ark:/12345/654xz321x54xz321/s3/f8.05v.tiff
            \_________/ \__/ \___/ \______/\____/\_______/
                 |       |     |  ...    |     |       |
                 |     Label   |   |   |   Sub-parts  Variants
                 |             |   |   |
 Name Mapping Authority (NMA)  |   Assigned Name
 |  Assigned Name      ...
                        |
       |   +---------- Shoulder: /x5
                Name Assigning Authority Number (NAAN)

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

Anchor
namespaces
namespaces
What is the purpose of the NAAN
?

NAANs subdivide the set of all possible ARKs, which is called the ARK namespace. In obtaining a NAAN, your organization reserves a subset of the ARK namespace for its exclusive use in creating ARKs. If you transition into or out of a vendor relationship, there is no impediment to taking your NAAN with you.

By obtaining a NAAN, an organization has the exclusive right to create ARKs using the NAAN as a kind of "prefix", in other words, all ARKs starting with it. That set of ARKs is infinite and is known as the NAAN's namespace. It is also a subset of the ARK namespace (the set of all possible ARKs). For example, the Internet Archive's NAAN namespace is all ARKs starting with "ark:/13960/". These NAAN-based prefixes, effectively subdivide the ARK namespace into non-overlapping sub-namespaces, each one holding an infinite number of possible ARKs. Since organizations can only create ARKs in their own namespaces, there can be no conflicting ARK assignments between organizations.

The NAAN also plays a key role in resolution. For example, if the N2T.net resolver cannot find an incoming ARK in its database, it looks at its NAAN and redirects the ARK to the local resolver registered with the NAAN. Similarly, a local resolver may receive incoming ARKs (presumably not from N2T) with NAANs it doesn't know about and may choose to redirect them to N2T.

Speaking of namespaces, in principle there is a sub-namespace associated with every prefix, even very long ones. For example, the full ARK for any object can be viewed as a prefix, with an infinite number of ARKs – naming object parts and variants – that can descend from it. In practice, the two most common prefix-based sub-NAAN namespaces are associated with objects and with something called "shoulders" (below)Your NAAN namespace is made up of all possible ARKs that start with "ark:/" followed by your NAAN and a "/". ARK resolution is loosely based on NAANs, but also accommodates further subdivision of a NAAN namespace into something called "shoulders" (below). Effectively each subset of that is defined by how its ARKs start is its own infinite namespace, whether the subdivision is at the NAAN, shoulder, or even object level.

Set of all ARKs startingNamespace definedExample ARK in that namespace
ark:/ARK namespaceAll ARKsark:/99999/fk4gt2m
ark:/12345/Namespace of the NAAN 12345 NAAN namespaceark:/12345/p987654
ark:/12345/x5Namespace of the 12345/x5 shoulder namespaceark:/12345/x5wf6789
ark:/12345/x5wf6789/Namespace of the 12345/x5wf6789 object namespaceark:/12345/x5wf6789/c2/s4.pdf

Can I make changes to a NAAN

...

?

You can change the registry entry for a NAAN by filling out the same online form used for requesting a new NAAN. For security purposes requests are processed manually. Example reasons for a change may include

  • notifying N2T that your organization's contact person or resolver URL will change,
  • updating your organization's name assignment policy (sample policy),
  • requesting an additional NAAN for a significant new body of ARKs or new organizational division, and
  • transitioning your NAAN to another organization that will carry on your work and take over your NAAN.

By the way, if your organization transitions into or out of a vendor relationship, there is no impediment to taking your NAAN with you.

What is a shoulder?

shoulder is a ( sub)namespace under a NAAN namespace, commonly used to help keep it organized. An organization may wish to delegate ARK assignment operations to several autonomous projects, and -NAAN namespace commonly used to help keep it organized. It is the set all ARKs starting with a fixed prefix that adds a few characters after the NAAN, and unlike the NAAN it is not terminated by a "/". Shoulders allow ARK assignment operations in a NAAN namespace to be delegated to autonomous projects just as NAANs do for the ARK namespace. Even if an organization has but one project at first, it rarely knows how it will want to use its namespace in the years ahead. Setting up each project under its own shoulder makes it impossible for any project's assignments to conflict with those of another – present, past, or future – because they take place in non-overlapping namespaces. Shoulders can also ease the namespace splitting problem.

In practical termsthe ARK string, a shoulder is usually the NAAN, a "/", then usually a letter , and a digit that come after the NAAN. For example, the shoulder is "/x5" in:

ark:/12345/x5wf6789/c2/s4.pdf

Why is there no "/" to mark the end of this important sub-namespace (ie, why isn't the shoulder "/x5/")? There are several reasons.

  • According to ARK rules, if you published ark:/12345/x5/wf6789/c2/s4.pdfthe "/" after "/x5" implies (a) that ark:/12345/x5 is also an object and (b) the object named ark:/12345/x5/wf6789/c2/s4.pdf is contained in it. This is likely untrue, except in a way that has little meaning to the  end user.

  • It's very tempting to add that "/" because it makes the shoulder boundary obvious to in-house ARK administrators, but it doesn't help the end user who is uninterested in the structure of your assignment operations and you may waste time having to re-explain that the shoulder is not actually an object that contains everything in the shoulder namespace.

  • In fact ARK administrators will always know where their shoulder boundary is if they follow the "first-digit convention" when choosing a shoulder, namely, a sequence of letters ending in a digit. The shoulder is then everything up to and including the first digit encountered after the NAAN.

...

  • NAAN

...

  • .

Are there restrictions on the use of NAANs?

...