Child pages
  • 2017-03-09—FileSets WG Meeting

This wiki space is deprecated

This wiki space is now deprecated. The wiki for Samvera (formerly the Hydra Project) is now to be found here: Samvera

Skip to end of metadata
Go to start of metadata

Date and Time

March 9, 2017, 2pm EDT

Connection Information

Google Hangouts:  https://hangouts.google.com/hangouts/_/artic.edu/pcdm-filesets

Moderator: Stefano Cossu

Notetaker: Stefano Cossu

Attendees

Agenda

  1. Notetaker?
  2. Review last week's action items
  3. Review code progress
  4. Further discussion on wireframes?
  5. Discuss public facing name for FileSets
    1. Should convey difference between Files and FileSets
    2. Should be easy to understand and memorize by non-technical users
  6. Next steps

Minutes

  1. Prev. action items: 
    1. PCDM validator needs some additional work before submitting to the community
    2. Also some additional info gathering needed for file association questions
    3. targeting next week to complete this
    4. This can be pitched for Dev congress + LDCX
  2. Work on validator code
    1. yml file decoupled from QA but it will be possible for individual implementers to plug QA in
    2. Expose ability to set custom validation config file 
    3. Other uses outside of pcdmuse: namespace are possible 
    4. This file is currently focusing on validation config; UI config can be tackled later and likely in a separate place
    5. This is basically a generic RDF type validator; we could use some existing AF machinery
      1. With the addition that we need to check that a file associated with a given use (RDF) type exist
    6. Whenever you change your type, AF will need to be able to tell the types that should not be removed
  3. Wireframes
    1. Create page: Adam Wead already implemented this in LAKEshore; should be possible to extrapolate the code from LAKEshore and put in Hyrax
      1. We can do all validation in the UI via AJAX and display error messages if validation fails at an point
      2. Have a separate service to handle validation logic
    2. Ran out of time to comment on other pages; no particular red flag at a quick glance
  4. Engaging other developers
    1. Complete validation PR
    2. Define a clear path and present to dev congress 

Action Items

  •  Andrew Myers Complete validation PR
  • Andrew Myers post file to file associations question to the tech list
  • Adam Wead summarise the rdf:type issue to the tech list; what problems would the solution cause
  • No labels

6 Comments

  1. Hey all, today I asked the Hydra Tech call attendees about validating rdf:types in the context of pcdm:use. Please see notes from this week's Hydra Tech call for some details: Hydra Tech Call 2017-03-15

    And here's my interpretation of that conversation (repeating it for my own sake):

    • As Adam Wead pointed out during our last meeting, the way that Files are added to FileSets individually (via jobs) means that there currently is not an easy way to validate the rdf:types of all Files within a FileSet at once, on the server side.
      • Instead, server-side validation will likely occur as each new File is added to the FileSet
      • This may result in half-baked FileSets, in the event that a newly added File has an rdf:type that fails validation rules.
      • There are also no transactions ,and thus no concept of "rolling back".
      • So once we end up with a half-baked FileSet, we would have to implement a rollback on our own, if we wanted to go that far (which I suspect would be a lot of work).
    • On the front-end, however, we can validate the rdf:types for all Files with in a FileSet at once.
      • This can help avoid half-baked FileSets when they are being created through the UI.
      • But it might be a lot of work.
      • Also, it implies two distinct validation processes (one for the front-end, and one for the back-end) that do not mirror each other. In other words...
        • If you are using the UI, we may be able to validate rdf:type for all Files at once. But...
        • ... if you are using a rake task, or API call, to ingest objects into your Hyrax app, then you are stuck with validating rdf:types one-file-at-a-time. Unless...
        • ... we figure out a way to front-load the validation before queuing the jobs to add Files to FileSets.
    • The validation stuff in ActiveFedora is only for validating rdf:types on models in the context of AF Associations.
      • You define a :type_validator when you set up the association, e.g.: https://github.com/projecthydra/active_fedora/blob/b02059a48107123ad30e13e59c7b6de7b0eb3fa4/spec/integration/associations_spec.rb#L69
      • The only validator class that AF provides is a NullValidator, whose only purpose is to provide a class with a .validate! class method that does nothing. I assume this is because AF code is just pulling whatever class is in :type_validator and running .validate! on it. So the NullValidator is there to honor that contract.
      • Also, I'm guessing that somewhere in the AF code, it's checking for :type_validator on models associated with pcdm:hasMember.
        • But if we were to use this code, we would want check for :type_valiator on models associated with pcdm:hasFile, because that's how Files are associated with FileSets.
        • ^ and I'm not sure how much work that would be, without digging further.
    • The validators in Hydra-PCDM are highly opinionated.
      • They all are relying on methods, which are defined in Hydra-pcdm modules, which are then mixed into Hydra PCDM models.
      • There is no way to change the behavior of the current Hydra-pcdm validators via configuration, and that's what we're wanting to do.
    • The validators at the Hydra-Works layer are using ActiveFedora :type_validator interface
      • But as mentioned above, the AF :type_validator interface is only for validating rdf:types in the context of AF associations.
    • I also chatted with David Chandek-Stark (Duke) about https://github.com/projecthydra-labs/hydra-validations.
      • He says he is still using it.
      • It is not supposed to have any dependencies on Hydra (only ActiveModel::Validations).
      • See their README for how it's used to validate rdf:type.
    • After this discussion.. i'm starting to think that sticking as close as possible to ActiveModel::Validations is a nice way to go.


    THOUGHTS?

  2. Well, I have two questions – one, would front-loading validation for rake tasks and API calls really be that difficult for people? Submit a group of files, validate them all, then pass them to job one-by-one? Honestly wondering.

    Two, what do we gain from using Hydra-Validation as a model? Looks good at first glance, though ... Does it let us pass in our validations as configuration, or solve some of the other specific problems above? And where would it go, would this be apart from either Hydra Works or PCDM? (Or are these things we'd need to suss out.)

    My thoughts were all questions (smile)


  3. "Front-loading" the validation would involve gathering the rdf:types of ALL Files within a FileSet, and running the FileSet's validator (which will be ours to write).

    Currently, Files are added to FileSets one at a time, using jobs (at least, afaik). As such, getting the rdf:types of all the Files before they are saved is tricky. But this is what we'd have to do.

    The things I like in hydra-validations is that it's using ActiveModel::Validations.. specifically the validates_with method, which allows you to specify a custom Validator class for running the validation. I think this is a good way for us to go as well.

    The difference for us, would be that our custom Validator class would:
    1) be able to be configured (either through a YAML file, or through an initializer)
    2) handle more complex validation logic.

    Currently, the Hydra::Validations::InclusionValidator from hydra-validations is simply extending the validates_inclusion_of macro to handle attributes values that are arrays, and returns true if that array is a subset of a known set of values. (See the hydra-validations README for a better explanation of this).

    We want to do that. But we also want to throw in validations of cardinality, and required vs. optional.

    So I'm thinking we'd come up with something similar to Hydra::Validations::InclusionValidator.. but I would recommend that we make is more specific... something like I sketched in this gist: https://gist.github.com/afred/dc99d18e9fe8dc1c77969b3df91ab0a9 

  4. For the UI part, are we thinking of using the 'requirements' section? https://docs.google.com/drawings/d/1C1ayWBwPS6XWsuaTkxIqqWxhE3COuQZ3LcG7LmjRpBA/edit

    Presumably that would avoid any need to get into jobs and actors in Hyrax.

    The Hydra::Validation code looks like a nice approach to me.

    1. Julie,

      You seem to have the need for slightly broader customization of this form, which seems to amount to the ability to configure a number of additional fields for individual Files uploaded in a FileSet. 

      While we want to make sure that this flexibility doesn't get abused by implementers in terms of number and diversity of additional fields, I think your approach is very valid. If there were a way to designate some fields in the content model so that they appear in the file upload form, including the use type by default, would that be a good starting point? 

      1. Hi Stefano,

        Focussing on 'type' is definitely the right approach at this point - I don't want to complicate or slow down work with requirements I haven't fully worked out yet.

        What I was wondering is how we'd go about hooking the validation into the 'requirements' section on the UI so that it wouldn't allow a save until all of the files had a type and the 'required' type was selected (or whatever the validation config needs). I haven't looked at the code yet to see how this works, but I'm assuming it's javascript.

        J.