Sprint Team

General

Meetings

Monday February 24

Sprint Planning

Tuesday February 25

  • frank asseg
    • dealing with overhead for the SCAPE project

    • beginning to work again on the cluster and getting it operational again

  • A. Soroka
  • Greg Jansen
    • got a 7-node cluster up and running at UNC, using TCP for both discovery and state transfer

    • next step: putting it under load using benchtool

  • Ben Pennell
    • adding three params to benchtool to allow manipulating the behavior with transactions enabled

    • also benchtool now needs to track the time taken for commits

  • Mike Daines
    • having a look at authZ use cases, taking notes as to completion

    • seems to boil down to "we need to store authZ metadata and make decisions using it"

    • discussing contextual info to be used at PEPs (e.g. IP addresses)

  • Scott Prater
    • "With the help of NewRelic tech support, I was able to get all the nodes to send server and webapp metrics to the NewRelic host.  So the dashboard now shows real-time resource consumption metrics for all the nodes. I spent quite a few hours last night trying to get all the nodes to talk to each other with the NewRelic jar loaded.  I used as a base Frank's jgroups-tcp-config.xml file, which uses TCP for state transfer

      and UDP for discovery. I continued to see the same problem as I did this weekend, where after the first three nodes have connected, the last three don't;  although the other nodes see them, they don't see any other node.  I spent some time reading the relatively sparse JGroups documentation (and even sparser Infinspan documentation), and played with timeouts, number of threads, and IP address binding parameters, to no avail. That's where I'm at right now -- I can get a cluster of three nodes running, but not more (though for a brief and shining moment last night, I had five of the six nodes seeing each other.  I was unable to reproduce that state, however). Next steps:  I'm going to try to use the TCPPING and FD_SOCK methods of node discovery and detection:  the docs advocate against this for large clusters, and rightly point out that there's a large performance hit when you go from UDP to TCP, but at this point, I'm more interested in just getting a cluster that works.  That will be the focus of my efforts today. If by this afternoon I haven't made headway, I'm going to go to the JGroups list."

Wednesday February 26

  • Scott Prater
    • "I'm booked in meetings all day today and tomorrow, so I won't be able to make it to the standup until Friday. I have nothing new to report -- no change from yesterday.  My goal sometime in the next 24-36 hours is to configure the TCP discovery, using Greg's configuration, and start running tests."
  • Mike Daines
    • continuing to look at authZ cases
    • looking at XACML spec in this capacity
  • Greg Jansen
    • cluster shutdown issues
      • making more shutdown hooks didn't solve the problem
      • now looking at the use of ModeshapeEngine to replace direct use of JcrRepository type
    • cluster shutdown issues are blocking performance testing for the cluster at UNC
    • when a solution is found, frank asseg will review
  • frank asseg
    • added fresh code to the FlZ cluster
    • doing performance testing now
    • talking to RADAR Project about the possiblity to use Fedora 4 as a metadata repository
  • Ben Pennell
    • Transactional-ized benchtool seems to be working
    • beginning to test it out
    • seeing marked improvement with mutating requests, particularly deletes
  • A. Soroka
    • helping Stefano Cossu work through problems with his "restrictive node types" use case
    • hopefully will turn back to indexer modularity problem soon

 

Thursday February 27

Fedora Committer's Call

Friday February 28

  • frank asseg
    • just got back, working on the cluster and reading comments on transactions in benchtool PR
  • Ben Pennell
    • working on updates to benchtool transactions PR
    • looking at cache files left behind after test runs, tracing this to see if it is a problem. Theory about versionable nodes has been rejected, b/c nodes in question are not versionable. Left running overnight, the cache is still not cleaned out.
    • will ask about this in #modeshape
  • Mike Daines
    • talked with Greg about authz and posted conclusions of use case analysis in pivotal, created some new tickets.
    • will link new authz tickets to a wiki page with redesign notes
  • Greg Jansen
    • finishing a PR for modeshape shutdown
    • will return to performance testing on the cluster afterwards

Monday March 03

  • Mike Daines - will post notes on changes to auth api to support use cases
    • has some tickets in icebox
    • remove dead code from policy enforcement class
  • frank asseg
    • working on the cluster
    • pulls a new war file and installed it, can run ingest with multiple threads now
    • trying pushing up the number of threads and try to increase performance
    • run single node performance test for comparison
    • merge pull request for benchtool from Ben
  • Ben Pennell
    • generating reports on effect of transactions on ingest and other speeds
    • still not able to get GC to cleanup cache files
    • default should clean up in 24 hrs, may post another query #modeshape list
    • performance degraded a lot depending on size of data in repo, will post metrics
  • Greg Jansen
    • finished modeshape shutdown ticket on Thursday, PR is awaiting review
    • testing a WAR on cluster (w/#260 shutdown PR)
      • found configuration problem with hostnames resolving to localhost IP.
      • will run benchmarks today.
    • out Friday PM for local projects

Tuesday March 04

  • Scott Prater
    • working with Greg Jansen's TCP-based config for clustering
    • still having a problem getting more than three members of cluster working together
    • pulling back on New Relic to remove variables and get to the point of doing a load test
  • Greg Jansen
    • working on getting UNC cluster set up
    • testing the clean shutdown of containers in a cluster
    • will get help from Scott Prater to test on UWisc clusters
  • Mike Daines
    • posted wiki page detailing possible mods to API/SPI to support authZ use cases
    • will consider the architectural constraint demanding no dependency from kernel or auth modules on HTTP constructs
  • frank asseg
    • still trying to achieve useful performance for clustered ingest
    • will start a ticket to expand operations in benchtool to include SPARQL Update
  • Unknown User (bbpennel)
    • benchmarking of transactional workflows
    • seeing performance become worse the more stuff that in the repository (diminishing returns)
    • also examining binary garbage collection, doesn't appear to be properly configurable
  • David Wilcox
    • examining and annotating beta objectives
  • Andrew Woods
    • testing delivered tickets
    • doing review for finished tickets
  • David Wilcox and Andrew Woods
    • determining priorities and state of beta objectives
    • preparing for meetings in Washington, D.C. next week

 

Wednesday March 05

  • A. Soroka
  • Mike Daines
    • discussed potential modifications to authorization SPI with Unknown User (bbpennel)
    • hopefully can now support by design the absence of a servlet container
    • ticket is filed, will work on it today
  • Greg Jansen
    • UNC cluster is operational!
    • it is not yet puppet-configured, but it is configured and it is working
    • there are problems using benchtool against a cluster
      • because it now uses transactions, which are problematic against a load-balanced cluster
    • Greg Jansen will files some tickets to get benchtool to use "sticky sessions" to avoid this
  • frank asseg
    • still running tests again SCAPE cluster
    • performance is terrible!
    • everything is ingested, but horribly slowly
    • in better news, we have been offered the use of another cluster
      • From a Romanian University
      • Four nodes using Eucalyptus, possible more to be added
      • Very fast, with very fast interconnects
      • We can have access via SSH, but we will need to put an acknowledgement of their support on our webpage (Andrew Woods and David Wilcox, perhaps you can do this?)
    • frank asseg is also working a ticket related to benchtool
  • David Wilcox
    • still reviewing and annotating use cases
    • will have more to say on this topic tomorrow or soon
  • Ben Pennell

 

Thursday March 06