VIVO Harvester
Fetch
JDBC Fetch
- Review Database Structure
- Identify Keys
- Identify Foreign Keys
- Dump the whole table
- Result Sets (Query or Point to Stored Procedure)
- Black and White List for tables to ingest
OAI Fetch
- Working for CiteSeer
- Double checked. Not working for CiteSeerX, working for standard CiteSeer using URL cs1.ist.psu.edu/cgi-bin/oai.cgi. Successful harvest in XMLVault/OAI/CiteSeer.xml -DRS
- Workingfor UF IR
HTTP Fetch
- Clean and make Generic or Extensible
- Configuration for PubMed
- Configuration for UF IR
Pubmed SOAP
- Method cleanup
Translate
- New node/attribute detection in XSL
Translate Standard Schemas
- Microformats
- hCard
- hGrant
- hResume
- hCalendar
- vCard
- vCalendar
- iCal
- eduPerson
- eduOrg
RDF Workflow
- Work with Brian Lowe to pull RDF Workflow from VIVO into a library
- Add RDFWorkflow to Jena Library
- Utilize new library in translate methods for harvester
Score
- Remove record handler ingest and instead only work with models
- Utilize transfer for loading input models
- Disambiguate authors
- Add generic field-matching function
- Need to allow for progressive match on name, zip, affiliation, possibly co-author, etc
- Fix Jena create Model performance
- Explore/Implement Jena SDB
Algorithms
- Pair Wise
- Neural
- Regex
Configuration
- Must allow for parameters to be passed in from command line for any algorithm
- Must allow for process flow and order dependency (possible solution is to pipe output and run score multiple times)
Transfer
- Query for Ontology Version
- Translate up to current Ontology Version
- Update/Overwrite/Append to Models
- Integrate with Data Provenance
- separate graphs for each harvested data source
- Include metadata about the source of the data
Qualify
- Configurations examples
Utilities
- SDB for the VIVO Harvester (Jena Connect)
- Fix argument parsing for model overrides (ie, in transfer "input-model", "model name for input (overrides config file))
GUI
- Point to VIVO Harvesters (on external systems) by entering in their information and finding the configuration settings
- View Config Files
- View Logs
- Tabulate data from logs
- graphical elements to display statistical data
- Provide configuration wizards
Additional Libraries
Data Sources
- NIH
- Grants
- PubMed publications
- NSF
- generic government sites
- Grants.gov
- ClinicalTrials.gov
- Scopus
- ISI (depending on agreements)
- Sakai
Testing
- Implement JUnit testing
- Ensure JUnit testing works
- Create development, staging, and production servers for Harvester testing
Process
- Build complete release script
- Implement release process for Harvester
- Create set of rules for staging
- Create set of rules for commits
- Create set of rules for packaging
- Implement release process for virtual appliances
- Create set of rules for updating
- Create set of rules for releasing
Building
- Implement local Maven repository to deal with 303 bug in Maven.
- Add RPM to maven build
Packaging
- Fix issues with virtual machines
- Ensure virtual machine creation is done as part of release (find a way to automate/semi-automate?)
Documentation & Demoing
- Howto's w/Pictures
- Example of a running People Harvest on a public server (vivo.ctrip.ufl.edu)
- Example of a running PubMed Harvest on a public server (vivo.ctrip.ufl.edu)
Community
- Create tutorials on Sourceforge usage and development integration.
- How to FTP upload
- How to SSH in
- Maven/Javadoc integration
External Interfacing
- Installing Joseki How-To
- Installing Sesame How-To
- Drupal How-To
- SPARQL End Point
- Linked Data
- Wordpress How-To
- SPARQL End Point
- Linked Data
- Sakai
- SPARQL End Point
- Linked Data
- example of a SPARQL end point (vivo.ctrip.ufl.edu)
- example of using a SPARQL end point (ctrip.ufl.edu?)
VIVO Authentication
- Establish Framework for Plug-in Play Authentication Modules
- Kerberos
- Shibboleth
- Active Directory (AD)
- LDAP
- Insure integration of authentication systems with authorization system of VIVO (VIVO group levels)
- GUI
- Security Record Viewer (view security logins from VIVO Application
- Security Set-up
- link authentication groups to security levels in VIVO
- specify the type of authentication (Shibboleth, Kerberos, whatever is installed)
- Install necessary tools on server from app (such as Shibboleth)
- Modify security files such as shibboleth.xml
VIVO Packaging
Targeted Formats
- Amazon Cloud
- VMWare
- Debian Package
- RPM Package
- War File
Processes
- Automated Release Process
- Integrate with Jim to allow for press of button against release code and all VMs and other packages build