VIVO Harvester

Fetch

JDBC Fetch

  • Review Database Structure
    • Identify Keys
    • Identify Foreign Keys
    • Dump the whole table
    • Result Sets (Query or Point to Stored Procedure)
    • Black and White List for tables to ingest

      OAI Fetch

  • Working for CiteSeer
    • Double checked. Not working for CiteSeerX, working for standard CiteSeer using URL cs1.ist.psu.edu/cgi-bin/oai.cgi. Successful harvest in XMLVault/OAI/CiteSeer.xml -DRS
  • Working(question)for UF IR

    HTTP Fetch

  • Clean and make Generic or Extensible
  • Configuration for PubMed
  • Configuration for UF IR

    Pubmed SOAP

  • Method cleanup

    Translate

  • New node/attribute detection in XSL

    Translate Standard Schemas

  • Microformats
    • hCard
    • hGrant
    • hResume
    • hCalendar
  • vCard
  • vCalendar
  • iCal
  • eduPerson
  • eduOrg

    RDF Workflow

  • Work with Brian Lowe to pull RDF Workflow from VIVO into a library
  • Add RDFWorkflow to Jena Library
  • Utilize new library in translate methods for harvester

Score

  • Remove record handler ingest and instead only work with models
    • Utilize transfer for loading input models
  • Disambiguate authors
  • Add generic field-matching function
    • Need to allow for progressive match on name, zip, affiliation, possibly co-author, etc
  • Fix Jena create Model performance
    • Explore/Implement Jena SDB

      Algorithms

  • Pair Wise
  • Neural
  • Regex

    Configuration

  • Must allow for parameters to be passed in from command line for any algorithm
  • Must allow for process flow and order dependency (possible solution is to pipe output and run score multiple times)

Transfer

  • Query for Ontology Version
  • Translate up to current Ontology Version
  • Update/Overwrite/Append to Models
  • Integrate with Data Provenance
    • separate graphs for each harvested data source
    • Include metadata about the source of the data

      Qualify

  • Configurations examples

    Utilities

  • SDB for the VIVO Harvester (Jena Connect)
  • Fix argument parsing for model overrides (ie, in transfer "input-model", "model name for input (overrides config file))

    GUI

  • Point to VIVO Harvesters (on external systems) by entering in their information and finding the configuration settings
  • View Config Files
  • View Logs
    • Tabulate data from logs
    • graphical elements to display statistical data
  • Provide configuration wizards

Additional Libraries

  • D2R Map

Data Sources

  • NIH
    • Grants
    • PubMed publications
  • NSF
    • Grants
  • generic government sites
    • Grants.gov
    • ClinicalTrials.gov
  • Scopus
  • ISI (depending on agreements)
  • Sakai

Testing

  • Implement JUnit testing
    • Ensure JUnit testing works
  • Create development, staging, and production servers for Harvester testing

Process

  • Build complete release script
  • Implement release process for Harvester
    • Create set of rules for staging
    • Create set of rules for commits
    • Create set of rules for packaging
  • Implement release process for virtual appliances
    • Create set of rules for updating
    • Create set of rules for releasing

Building

  • Implement local Maven repository to deal with 303 bug in Maven.
  • Add RPM to maven build

Packaging

  • Fix issues with virtual machines
  • Ensure virtual machine creation is done as part of release (find a way to automate/semi-automate?)

Documentation & Demoing

  • Howto's w/Pictures
  • Example of a running People Harvest on a public server (vivo.ctrip.ufl.edu)
  • Example of a running PubMed Harvest on a public server (vivo.ctrip.ufl.edu)

Community

  • Create tutorials on Sourceforge usage and development integration.
    • How to FTP upload
    • How to SSH in
    • Maven/Javadoc integration

External Interfacing

  • Installing Joseki How-To
  • Installing Sesame How-To
  • Drupal How-To
    • SPARQL End Point
    • Linked Data
  • Wordpress How-To
    • SPARQL End Point
    • Linked Data
  • Sakai
    • SPARQL End Point
    • Linked Data
  • example of a SPARQL end point (vivo.ctrip.ufl.edu)
  • example of using a SPARQL end point (ctrip.ufl.edu?)

VIVO Authentication

  • Establish Framework for Plug-in Play Authentication Modules
    • Kerberos
    • Shibboleth
    • Active Directory (AD)
    • LDAP
  • Insure integration of authentication systems with authorization system of VIVO (VIVO group levels)
  • GUI
    • Security Record Viewer (view security logins from VIVO Application
    • Security Set-up
      • link authentication groups to security levels in VIVO
      • specify the type of authentication (Shibboleth, Kerberos, whatever is installed)
      • Install necessary tools on server from app (such as Shibboleth)
      • Modify security files such as shibboleth.xml

VIVO Packaging

Targeted Formats

  • Amazon Cloud
  • VMWare
  • Debian Package
  • RPM Package
  • War File

    Processes

  • Automated Release Process
    • Integrate with Jim to allow for press of button against release code and all VMs and other packages build