Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

"Easy" Installer for DSpace

Primary Goals of this work

Prototype Only

This work is an initial prototype. It should not be considered stable until formally released as a part of DSpace

Main Goals:

  • To create an easier installation/upgrade process, which does not require familiarity with Maven or Ant.
    • NOTE: However, it should still be possible for developers to build DSpace from source code using current Maven and/or Ant tools
  • Installer should not require recompiling code, or pulling down anything via Maven. Installer should be able to be complete successfully with no internet connection whatsoever (i.e. the installer will need to have all third party dependencies & DSpace code within it)
  • Installer should guide users by asking questions which will fill out the basic settings in dspace.cfg file.
    • Basic settings include the installation location and all database configuration settings

Potential features requiring further investigation:

  • Can the installer also be used to perform an "upgrade" of DSpace?
    • Obviously, it would not be able to automatically update their customizations. But, could potentially help users to more easily upgrade to the latest 'out-of-the-box' version of DSpace (and copy their customizations to a backup location, where the user can reapply them as needed once upgrade is complete).
    • Can the installer help users upgrade their DSpace database? (This would require the installer to know what version user is currently running, and then call the necessary SQL upgrade scripts in proper order.)

Initial Usage Details

Where is Code Available At?

The code is available from SVN prototype branch at: http://scm.dspace.org/svn/repo/sandbox/installer-prototype/

This code only includes a customized version of the normal 'dspace' Assembly module from Trunk (1.8.0-SNAPSHOT).

How do I build the installer?

After downloading the code, run the following from [installer-prototype]/dspace/ directory:

mvn package -Pdspace-installer

May need to first build Trunk

Currently, to keep the 'installer-prototype' SVN copy very minimal, I have not copied over an entire version of Trunk. The 'installer-prototype' only includes the 'dspace' Assembly directory. So, if you run into Maven build errors, you may first need to build Trunk, in order to ensure your local Maven repository (~/.m2/) has a copy of all 1.8.0-SNAPSHOT dependencies.

How do I run the installer?

After the build completes, you'll see a JAR installer created at [installer-prototype]/dspace/target/dspace-installer-1.8.0-SNAPSHOT.jar

You can execute this Installer by running the following from the [installer-prototype]/dspace/target/ directory:

java -jar dspace-installer-1.8.0-SNAPSHOT.jar

After you run the installer, it will ask a series of questions around where you wish to install a copy of DSpace.

Currently only works for DSpace Install process

Currently, the Installer only works fully for a clean (fresh) install of DSpace. It is not fully functional in terms of updating or upgrading DSpace (some parts may work, but some may still be buggy).

Initial Implementation Details

How does the Installer work? How is it built/implemented?

  • The initial Installer is packaged using One-Jar. One-Jar essentially provides us with an easy way to create an executable JAR file.
  • When a user runs the JAR file, the One-JAR 'Boot' class is automatically called
  • The One-JAR 'Boot' class automatically calls whatever is located at /main/main.jar within the installer.jar file. In our case, /main/main.jar calls the new dspace-install-api.jar file (see [installer-prototype]/dspace/src/assemble/installer-assembly.xml for details)
  • The dspace-install-api.jar is what actually performs the installation
    • This JAR actually embeds Apache Ant within it, and also contains a custom Ant script (installer-build.xml) which creates the DSpace installation directory similar to how it is created in past versions of DSpace. More notes on this below.
  • NOTE: The 'dspace-installer.jar' actually includes a full copy of all third party dependencies (JARs) as well as a copy of the DSpace install directory. See the [installer-prototype]/dspace/src/assemble/installer-assembly.xml for details.

Overview of dspace-install-api

The heart of this Installer is the new /dspace/dspace-install-api/ module. This module currently only includes a few new files:

  • org.dspace.install.Installer - This is the main executable Installer class. Currently, it essentially just uses the Apache Ant API to call a custom Installer 'installer-build.xml' file (which is based off the default DSpace [installer-prototype]/dspace/src/main/config/build.xml file). NOTE: Even though this installer uses the Ant API, Ant is not required to be installed on the local system. The Ant API is included within the Installer itself.
  • /src/main/resources/installer-build.xml - This is the Ant Build file which actually tells Ant what it needs to do to actually perform the Install process.

Why embedding Maven into Installer won't work

This section is just a note on implementation details that have unfortunately ended in failure.

Initially, I thought: "If I can embed Ant in the installer to actually create the [dspace] installation directory, why not go one step further and embed Maven, so that the Installer.jar just auto-builds DSpace for you via Maven". The main reason for potentially embedding Maven was to allow for a smaller Installer.jar overall (less duplication of JAR dependencies, for each of the various DSpace WARs), and to allow Maven to do what it does best (namely managing dependencies).

Unfortunately, that is not as simple as it may sound. To properly embed Maven into the Installer.jar, you'd need to do the following:

  1. Embed a complete copy of an offline Maven Repository into the Installer.jar (this would ensure Maven didn't need to go and download all dependencies for you – which obviously slows down the installation process, and wouldn't be much of an improvement over the current DSpace build process)
  2. Embed a copy of Maven into Installer.jar
  3. Ensure that when Installer.jar is run, it kicks off embedded Maven, points it at the embedded offline Maven Repository and then builds & installs DSpace.

Although #2 and #3 above seem to be possible, there seems to be no easy way to do #1 (embedding a copy of an offline maven repo). Unfortunately, Maven does not come with a plugin which can successfully create an entire offline repository, and ensure all dependencies are written there. A few more notes on this:

The Conclusion: At least at this point in time, embedding Maven into an Installer is not really a plausible solution. We'll need to find a better way of avoiding an ever growing Installer.jar file (which is already rather large as some basic dependencies, e.g. dspace-api.jar, are duplicated 7 times in that one Installer.jar, once for each of the six webapps and once in [dspace]/lib).

  • No labels

6 Comments

  1. Tim,  my thoughts are that the reason to include a maven repo and maven intot he installation process is to not have to duplicate the dependencies across each and every webapp and the lib directory.  Perhaps we see that as the goal rather than including maven.  It could be a later optimization to the build.  For instance (and I would consult with Graham on this...) we can lift most of the required jars into the lib directory and use the servlet container (embeeded jetty or tomcat) to load those outside the wars.  It would mean making the war deployed into webapps exclude the dependencies found in dspace/lib.  If I recall, the most critical part of attaining this had to do with breaking out a couple of the classes from dspace-api (DatabaseManager) so that their static state could be separated across the webapps. 

    I think the the dspace-core-api will come in handy here as well, allowing us to push a core api jar into the webapps for compilation and then use the dspace-api deployed under dspace/lib as the implementation of the api compiled into the war. another optimization that starts to look a lot like our deployment scenario for DSpace 2

    1. That sounds like a similar direction to what I was next planning to investigate. My next step was to see if there was another way to avoid duplication of dependencies in the Installer.jar (similar to what you've suggested above). I just wanted to be sure to capture why embedding of Maven wasn't a plausible solution.

  2. May be worth someone revisiting this work to see if we can find another way to "embed" Maven (even in an "online" mode?). For example, could be worth testing if we could embed Maven still and just require that you still be online to run the Easy Installer? (I'll admit, my initial goal was to see if Maven could be embedded in an 'offline' mode – but, as noted above, that doesn't seem possible because of issues in Maven itself)

    Also, see recent discussion thread from dspace-devel around an "Easy Installer": http://www.mail-archive.com/dspace-devel@lists.sourceforge.net/msg06012.html

    1. Honestly, I think even that might be pushing it a bit far. I don't see what is gained by the whole thing being packaged up in a single executable.

      Right now, the assembly / project layout looks like:

      dspace/
      
         bin/
      
         config/
      
         etc.
      
      

      So, what if we create the download to be:

      dspace-release.zip
      
         tools/
      
            apache-ant/
      
            apache-maven/
      
         dspace/
      
            bin/
      
            config/
      
            etc.
      
      

      And then, you have an executable file (jar, shell script, whatever), that asks questions, makes modifications to the configuration in dspace/, and forks out to the Maven and Ant distributions that are included under tools/.

      The user then just has to unzip the file, then run the 'install' executable that wraps up all the other steps. It's (relatively) simple to put together, entirely consistent with our 'advanced' setup, gives a firm foundation for future customizations, and potentially in the future allows us to make upgrades easier too.

      In terms of your 'going offline' frustrations, I would suggest that a (time-consuming but feasible) approach is to grab the artifacts and poms, and fashion a script that does 'mvn install' for each of them - you've then got a local repository with everything you need. But is Maven, on first execution, spending ~ 10 minutes downloading necessary artifacts that big a problem? Really? In most cases, it's an inconvenience more than a problem, and making everyone download a huge installer seems counter-productive - just as long as we are clear that it needs to be online will take a few minutes.

      I can see there being some cases where it's useful to be able to go offline - we could always create a secondary (huge) package that includes all the artifacts and the 'mvn install' scripts to create the local repository. If you need that, you can download the package, put it in the same directory you've unzipped the distribution above to, and the installer could recognise that it's there and install the Maven repository, and force the forked Maven processes to run in offline mode.

      1. I'd say...

        you can create a "local" repo in the app above (something like dspace-release.zip/apache-maven/.m2/repository  

        and you can call it from maven...  mvn -Dmaven.repo.local=apache-maven/.m2

        you can also toss in your own settings.xml and control defaults there...

        I would run a build without the dspace-xxx projects from scratch, the size of that local repo will be what you would need to be able to ship, I expect it should be under 50Mb to be feasably shippable.

      2. This does seem like a useful way to rethink about this whole "Easy Installer". Once I get back to this work (currently swamped with other deadlines), I'll look into these suggestions further to see if what I've built can actually be reorganized to do exactly what you suggest (at a glance, I think I've already built most of the 'pieces', it just may be a matter of reorganizing them and repackaging them in a different way).

        Thanks for your thoughts!