In 1998 when the original pa=
per detailing the Fedora architecture was published there were many mor=
e open questions around the boundaries of responsibility of a Fedora reposi=
tory and how the services the architecture could offer fit into the nascent=
repository application stack. In the subsequent 18 years, the maturation o=
f both the software ecosystem surrounding a Fedora repository and the real-=
world requirements defined by the experience of our community=E2=80=99s use=
cases have clarified Fedora=E2=80=99s role in supporting the goals of pres=
ervation and access in the context of big data and performance at scale. =
span>
A Fedora repository is a digital obje=
ct repository with Web-facing capabilities for publishing objects and backe=
nd facilities for managing and preserving those objects. The resources expo=
sed on the web, must abide by established principles of RESTful HTTP as fir=
st-class participants in the read-write Web. Fedora must also enable bit-le=
vel preservation of the resources it manages. These access and preservation=
functions must maintain response performance while scaling the number of o=
bjects, the number of bytes, and the size of individual bitstreams, as well=
as the number of client requests.
It has become clear that attempting t=
o address all of the requirements from the variety of Fedora installations =
results in an implementation that only partially satisfies any given instit=
ution, at extra cost to all. This has led to the organizational decision to=
return the focus of the Fedora repository to its essential, core capabilit=
ies while encouraging decoupled integration with external services. Example=
s of common external services include validation-on-ingest, bulk ingest/ret=
rieval/edit, resource transformation, derivative generation, etc. This appr=
oach embraces the assumption that the Fedora repository is one piece in an =
institution=E2=80=99s larger infrastructure ecosystem.
However, there is still enough divers=
ity of requirements around scale, storage, and performance that the expecta=
tion that a single Fedora implementation will satisfy all requirements is o=
ptimistic at best. The reference Fedora implementation will target many Fed=
ora use cases, but the option must be available to introduce alternate Fedo=
ra implementations without impacting repository client code or integrations=
. For example, applications that interact with a Fedora repository should n=
ot be concerned with the implementation idiosyncrasies required by the prof=
ile of the managed resources.
This speaks to the need for a well-sp=
ecified application programming interface (API) that provides a stable laye=
r of abstraction between Fedora clients and repository instances. In this w=
ay, alternate Fedora implementations suited for specific service profiles c=
an all expose the same core of services to repository clients.
This separates two previously-merged =
concepts: Fedora as an API and Fedora as the implementation of that API. Th=
is can be thought of as analogous to how relational databases or RDF triple=
stores service common SQL and SPARQL-Query requests in the same way despite=
different backend implementations. A well-defined space of shared responsi=
bility will exist between repository services and the clients that consume =
those services. Furthermore, the Fedora API specification will be precise e=
nough to allow for automated verification by a technology compatibility kit=
(TCK); a suite of tests to be run against a prospective implementation. A =
Fedora TCK would provide a means by which alternate implementations of the =
Fedora API could be verified as =E2=80=9Cdoing Fedora=E2=80=9D.
Although Fedora 4 has offered a produ=
ction-ready REST API since late 2014, the Fedora API specification has yet =
to be formalized. The API specification is the most fundamental of several =
specifications for core services of a Fedora repository:
As these services are not specific to=
the Fedora community, we have the opportunity to take advantage of the eff=
orts made by much larger communities of practice. To the degree that this i=
s possible, the Fedora API specification will reuse existing standards, so =
that existing standards-based client libraries can be used to interact with=
the Fedora API. Likewise, on the server-side, existing open source compone=
nts can be reused in Fedora implementations. The more closely Fedora=E2=80=
=99s API specification aligns with standards, the more reuse is possible, r=
esulting in less burden on the Fedora community for long-term maintenance. =
Additionally, the alignment with standards opens the door to greater intero=
perability with the broader Web, which will engender greater exposure and u=
se of repository resources.
As noted, the rationale for defining =
a Fedora API specification includes stability for clients, freedom for serv=
er-side implementation, and Web interoperability. Currently, there is a sin=
gle Fedora implementation. This implementation is built over an open source=
JCR implementation. Although it is possible to extract a Fedora API specif=
ication from a single implementation, having two or more implementations of=
the API is necessary to prove with greater certainty that the API is indee=
d a generalization; an abstraction from concrete reality. Extracting an API=
from a single example creates an API that is biased towards that example. =
For their respective benefits, it is conceivable that alternate Fedora impl=
ementations will emerge based on such technologies as Apache HBase, Apache =
Cassandra, Apache Marmotta, various triplestores, or Posix filesystems. The=
advent of alternative implementations of the Fedora API specification will=
provide confirmation of the utility and generality of the API abstraction.=
Driven by a firmer understanding of t=
he Fedora repository=E2=80=99s responsibilities, a desire to limit the cost=
of client software, an interest in broader interoperability, and a need to=
support diverse use cases, the Fedora API specification represents a signi=
ficant milestone for the project and the community.