The lack of a simple synchronous search within Fedora eliminates the possibility of certain kinds of interaction patterns deemed desirable by some members of the community.  On the other hand, having extremely sophisticated and comprehensive search support built into Fedora pushes the application towards the "monolithic" design pattern that negatively impacts the ability to optimize for performance and scale considerations.  By making simple synchronous query support an optional specified extension to Fedora, with a lightweight implementation in the community Fedora implementation we can support these alternate interaction patterns without imposing undue burdens on future Fedora implementations or significant performance impacts on the community implementation.


Desired Features

Query ExamplePurpose ServedNotes
Find the item(s) with a given dc:identifier.

When performing a migration or large ingest operation that may be halted or interrupted, being able to identify whether a particular item was ingested without storing information outside of the repository would allow resumption of the process.  For example, your ingest script could be roughly:

  1.  does item X exist in the repository (where X is identified by having a particular dc:identifier)
  2. if not, ingest it
  3. if so, go to the next item
While the example uses dc:identifier, it wouldn't be a problem to expand this to any directly applied user-managed triple.









Known Implementation Issues

The Modeshape implementation dynamically generates many of triples in its output.  For example, rdf:type triples based on whether a resource is a container or a binary, membership triples, and triples generated by Direct or Indirect Containers.  These generated triples are not persisted where it would be easy to search them.

  • No labels

3 Comments

  1. I think all of the use cases I have are covered by the general case of the "find by dc:identifier" use case: find objects that have a user-supplied triple with the given predicate and object.

    Some specific instances are:

    • Find by "local identifier", such as the catalog record number, ID from the previous system, etc.
    • Find by type, such as all objects that have a given (user-supplied) rdf:type, or tagged with a given Samvera model
    • Find by user, such as all objects deposited by the current user
  2. I like the idea of this being an optional item, since we'd likely just have a triple store tacked onto the side of fedora to do this type of query. 

    1. Actually, given the simplicity of these query patterns, I would not be surprised if we could not bring back a subset of the previous query feature implementation focused on the specific use cases here.

      https://github.com/fcrepo4/fcrepo4/pull/620