This page is a work in progress. If you have notes/hints/tips on DSpace development with Git/GitHub, please feel free to suggest their addition, or even add them to this page directly.
GitHub also has a ton of great guides on working with Pull Requests. They are very useful in understanding the general concept of creating/managing your Pull Requests (PRs).
As noted below, we also recommend forking the main DSpace GitHub repository (https://github.com/DSpace/DSpace). Again, GitHub has excellent guides on managing forks:
Another (third party) tutorial on the entire PR process (including "squashing" your commits into a single commit): https://yangsu.github.io/pull-request-tutorial/
A list of some possibly useful external Git resources:
(Feel free to add to the list)
Still want to use SVN locally, even though DSpace is on GitHub?
(Borrowed from Fedora's Git Guidelines and Best Practices)
Git allows a developer to copy a remote subversion repository to a local instance on their workstation, do all their work and commits in that local repository, then push the state of that repository back to a central facility (github).
Bearing in mind that you will always being doing your work and commits locally, a typical session looks like this:
Get a copy of the central storage facility (the repository). This is how you download a copy of the DSpace Source Code (i.e.
[dspace-source]), but this source code directory also is a valid local git repository. In the below example, we've called this directory "dspace-src", but you can call it whatever you want.
git clone email@example.com:DSpace/DSpace.git dspace-src cd dspace-src
Create a localbranch called "DS-123". This is what is traditionally called a "topic" or "feature" branch in Git. You are creating a branch to work on a specific topic or feature (in this case a ticket named "DS-123").
git branch DS-123
Create a local copy of the branch from master (if it doesn't exist) and make it your active working branch. You are now developing on the
git checkout DS-123
Now, start creating, editing files, testing. When you're ready to commit your changes:
git add [file]
This tells git that the file(s) should be added to the next commit. You'll need to do this on files you modify, also.
Commit your changes locally. This only modifies your localcopy of the repository, and the commits only happen in your local "DS-123" branch.
git commit [file]
Now, the magic:
git push origin DS-123
This command pushes the current state of your local repository, including all commits, up to github ("origin" repository). Your work becomes part of the history of the public "DS-123" branch on github.
git push is the command that changes the state of the remote code branch. Nothing you do locally will have any affect outside your workstation until you
push your changes.
git pull is the command that brings your current local branch up-to-date with the state of the remote branch on github. Use this command when you want to make sure your local branch is all caught up with changes
push'ed to the remote branch. The pull command executes git fetch, which retrieves the actual changes followed by git merge, putting the changes in your codebase.
master: this is the main code branch, equivalent to trunk in Subversion. Branches are generally created off of master. However, it is usually recommended that you not do your work directly in the master branch. Instead, you should look to create new branches frequently (e.g. a new branch for each feature/ticket you are working on), and once that work is completed, merge it back into the master branch. Both branching and merging are much easier in Git, and should become a part of your daily development practices. For more information, see Pro-Git's chapter on "Basic Branching & Merging"
origin: the default remote repository (at GitHub) that all your branches are
pull'ed from and
push'ed to. This is defined when you execute the initial
git clone command. For more information see Pro Git's chapter on "Working with remotes"
unpublished vs. published branches: an unpublished branch is a branch that only exists on your local workstation, in your local repository. Nobody but you know that branch exists. A published branch is one that has been
push'ed up to GitHub, and is available for other developers to checkout and work on.
fast-forward: the process of bringing a branch up-to-date with another branch, by fast-forwarding the commits in one branch onto the other. For more information, see Pro-Git's chapter on "Basic Branching & Merging"
rebase: the process by which you cut off the changes made in your local branch, and graft them onto the end of another branch. It also lets you reorganize or combine local commits in order to "clean up" your commit trail before you share it publicly via GitHub. For more information, see Pro-Git's chapter on Rebasing and GitHub's 'rebase' page.
The DSpace Developers/Committers are still working on our Git Guidelines & Best Practices.
But in the meantime, here's some development guidelines from a few "third parties" (feel free to add additional links)
Clone the repository. (The git repo is ~65MB). In the below example, we named the local directory "dspace-src", but you can name it whatever you want.
git clone git://github.com/DSpace/DSpace.git dspace-src cd dspace-src
At this point, you now have a copy of the DSpace Source Code (i.e.
[dspace-source]), and you are checked out to the branch
master (master is akin to SVN trunk), which will work, but it is the bleeding edge of development and not recommended for production instances.
If you would like to develop on DSpace for your local needs (University, Library, or Institution), you are encouraged to fork this GitHub repository (see also #Developing from a Forked Repository section below), and commit your changes to your personal/organizational repository. We recommend that you build your repository off of a released "tag" of DSpace such as
dspace-4.1. A tag is a human readable marker for a specific commit. Unlike branches, on which commits are added as they grow, tags are supposed to be static. The benefit of being based off of a tag/release-branch is that releases have a series of testing phases to ensure high quality, and there is some maintenance of bug and security fixes.
git checkout dspace-4.1
From there, you can follow the standard DSpace build instructions in order to build/install DSpace from the source code. For example:
mvn package cd dspace/target/dspace-[version]-build.dir ant update /etc/init.d/tomcat6 restart
This approach is recommended for all DSpace developers (especially non-Committers), as it allows developers to store their own local customizations in their own forked GitHub repository.
Although the below instructions only detail how to perform these tasks via the command-line, some developers may find that an Integrated Development Environment (IDE) will provide the same Git commands/options. For more information on using DSpace with an IDE, see the list of IDEs at: Developer Guidelines and Tools
Clone your GitHub Repo to your local machine. Now that you have a fork of the DSpace GitHub repository, you'll want to "clone" your repository to your local machine (so that you can commit to it, etc.). You can clone it to whatever directory you wish. You can click the clone button in the Github web interface, or you can use the following command line instruction. In the below example we call the directory "dspace-src". For this to work, you need to setup an ssh key with Github first:
git clone firstname.lastname@example.org:[your-username]/DSpace.git dspace-src cd dspace-src
You now have the full DSpace source code, and it's also in a locally cloned git repository!
For easier Fetches/Merges, setup an "upstream" repository location. If you have forked the DSpace GitHub repository, then you may want to setup an "upstream" remote that points at the central DSpace GitHub repository. It basically just provides you with an easier to remember "name" for the central DSpace GitHub repository. This is described in more detail in the GitHub "Fork a Repo" guide. The second command will download all new references (branch names, tags, ...) from the upstream repo into your local repo.
git remote add upstream git://github.com/DSpace/DSpace.git git fetch upstream
(Technically you can name it something other than "upstream". But, "upstream" is just the GitHub recommended naming convention to avoid confusion when using "origin" for your personal fork of "upstream" on GitHub, which you need to submit pull requests to "upstream").
Create a branch for each new feature/bug you are working on. Because Git makes branching & merging easy (see Pro-Git's chapter on "Basic Branching & Merging"), you should create new branches frequently (even several times a day) and avoid working directly in the master branch (unless you are making a very minor change). In this case, we'll create a local branch named "DS-123" (note that this branch only exists on your local machine so far). We'll also perform a "checkout" in order to switch over to using this new branch.
git branch DS-123 git checkout DS-123
Do your development work on your new branch, committing changes as you go. Note that at this point, you are only committing changes to your local machine. Nothing new will show up in GitHub yet, until you pushit there. This is a very basic example of a single file commit, but you get the idea.
git commit NameOfFileToCommit.java
Optionally, you can push these changes and this "feature" branch up to your GitHub account. If you want to share your work more publicly, you can push the changes and your new branch up to your personal GitHub repository:
git push origin DS-123
In this command "origin" is actually the name of the repository that you initially cloned (from your own personal GitHub account). This pushes your new branch up to GitHub, so that it is publicly available to other developers.
Optionally, once you have finished your work, you may wish to merge your changes to your "master" branch. Your personal "master" branch is where all your completed code should eventually be merged ("master" is loosely equivalent to "trunk" in Subversion). So, once you are done with the branch development, you should merge that code back into your "master" branch. Luckily, Git makes this simple and will figure out the best way to merge the code for you. In rare situations you may encounter conflicts which Git will tell you to resolve. For more details, see Pro-Git's chapter on "Basic Branching & Merging". In order to perform the merge, you'll first need to switch over to the "master" branch (the branch you are merging into):
git checkout master git merge DS-123
There! You've now merged the changes you made on the "DS-123" branch into your personal "master" branch!
Optionally, push this merge up to your GitHub account. Again, at any time, you can push your local changes up to your GitHub account for public sharing. So, if you want to push your newly merged "master" branch, you'd do the following:
git push origin master
(I.e. You are pushing your local "master" branch up to the "origin" repository at GitHub. Remember, "origin" refers to the repository you initially cloned, which in this example would be your personal GitHub repo that you cloned in Step #1 above.)
Once your branch is no longer needed, you can delete it. Really, there's no need to keep around all these small branches! If you generated a Pull Request from this branch, you will need to keep it around until the Pull Request is either merged or closed. But, once you no longer have any other use for the branch you created, just delete it! Here's an example of deleting the "DS-123" branch from both your local machine and from your public GitHub account (if you shared it there)
# Remove the branch locally first git branch -d DS-123 # If you have pushed it to GitHub, you can also remove it there by doing a new push (notice the ":") git push origin :DS-123
Fetch changes from central DSpace GitHub. New changes/updates/bug fixes happen all the time. So, you want to be able to keep your "fork" up-to-date with the central DSpace GitHub. In this case, you now can take advantage of the "upstream" remote setting that you setup back in Step #2 above. If you recall, in that step, you configured "upstream" to actually point to the central DSpace GitHub repo. So, if there are changes made to the central DSpace GitHub, you can fetch them into your "master" branch as follows:
# Fetch the changes from the repo you named "upstream" git fetch upstream
What this command has done is actually create a new "upstream/master" branch (on your local machine) with the latest changes to be merged from that "upstream" repository.
Merge changes into your Local repository. Remember, "fetching" changes just brings a copy of those changes down to your local machine. You'll then need to merge those changes into your "master" branch, and optionally push the changes back to your personal public GitHub repository.
# First, make sure we are on "master" branch git checkout master # Now, merge the changes in the "upstream/master" branch into my "master" branch git merge upstream/master
In this case, Git will attempt to merge any new changes made in the "upstream" repository into your local "master" branch.
Push those merged changes back up to GitHub. Once you are up-to-date, you may now want to push your latest merge back up to your public GitHub repository.
git push origin master # If the 'fetch' above pulled down new tags/branches, you also may wish to run the following to push those to your own repo. git push origin --all
Additional Handy Git Commands
git status- At any time, you may use this command to determine the status of your local git repository and how many commits ahead or behind it may be from the "origin" repository at GitHub. It also tells you if you have local changes that you haven't yet committed. For more info type:
git help status
git log- At any time, you may use this command to see a log of recent commits you've made to the current branch. For more info type:
git help log
git diff- At any time, you may use this command to see differences of your current in progress work. For more info type:
git help diff
git pull- Can be used instead of using
git fetchfollowed by
git merge. A "pull" does both a fetch and an automated "merge" in a single step.
git stash(See also: Stashing) - Allows you to temporarily "stash" uncommitted changes. This command is extremely useful if you want to do a "pull" or "merge" but were working on something else. You can temporarily "stash" what you are working on to perform the merge/pull, and then use
git stash applyto reapply your stashed work.
git rebase(See also: Rebasing) - This tool is extremely powerful, and can be used to reorganize or combine commits that have been made on a local branch. It can also be used in place of a "merge" (in any of the situations described above). However, as it changes your commit history, you should NEVER USE REBASE ON ANY BRANCH THAT HAS BEEN PUBLICLY SHARED ON GITHUB. For more information, see Pro-Git's chapter on Rebasing and GitHub's 'rebase' page.
git cherry-pick- useful to grab a single commit into current branch from any of the local branches or remote branches added locally. A handy use-case is when we have a master branch and a bugfix branch (e.g. dspace-5_x) and you want a bugfix to go to both branches, you can commit to either branch, then switch to the other and execute "git cherry-pick [commit-hash]". Don't forget to push both branches, as usual. For details, see Getting a commit to multiple branches (backporting)
If you want to show someone the diff between two commits, you can do it directly using GitHub functionality. Example:
As you can see, the commit can be specified either as a hash or as a tag (they're interchangable). Bonus tip: the two commits don't even have to be in the same repository, so you can compare e.g. your fork to the official repo.
In addition to the instructions below, it may be worth reading the GitHub guide on Using Pull Requests, as it provides a good overview of how Pull Requests are used to contribute, review, and discuss code contributions to any GitHub codebase. You may also want to print out and keep handy this Git Pretty poster.
Pull requests can be sent from any branch or commit but it’s recommended that a topic branch be used so that follow-up commits can be pushed to update the pull request if necessary.
Rebasing your PR: As commits are merged into the master branch over time (via merges of other pull requests) your own pull request will probably need updating. This is a fairly straightforward operation, however, it can also easily go wrong. Here's a quick overview of the commands you'll need to run:
cd YOUR_WORKING_COPY_PATH git fetch --all git checkout master #may have to git stash first, but, don't forget, you need to be ON THE MASTER BRANCH before you proceed git pull upstream master git push origin master git pull origin master #not necessary but a nice sanity check git checkout BRANCHNAME git rebase -i master
The third step above is crucial: you must be on the master branch before you proceed with the remaining steps. Sucessfully checking out the master branch may require you to first run
git stash to set aside your work in progress. Doing so may send you off running a series of commands, and you may lose track of where you are going. If you accidentally run
git pull upstream master in your own branch, your pull request will show many (possibly hundreds) of unrelated commits. It's difficult, though not impossible, to reverse this situation. The Git Pretty poster can help you resolve such situations.
As long as the PR contributor checked the "Allow edits from maintainers", Committers should be able to update PRs from any contributor. Here's some hints on doing that.
There's two main approaches:
Finally, push those changes back up to the contributor's forked repository like this:
# Where user/repo is the forked repository # local_branch_name is your local branch, and remote_branch_name is the contributor's PR branch git push email@example.com:user/repo local_branch_name:remote_branch_name
As a committer, to be able to push to the official DSpace/DSpace repository, you need to have your public ssh key added to your GitHub account. To do that, go to Account settings -> SSH Keys. You can add multiple ssh keys (useful if you use multiple machines).
If you don't have a ssh key generated yet, you can generate one using:
ssh-keygen -t dsa
The recommended setup is as follows:
# make sure you have forked the DSpace/DSpace repo on GitHub to your OWN GitHub Account git clone firstname.lastname@example.org:YourName/DSpace.git cd DSpace git remote add upstream email@example.com:DSpace/DSpace.git git fetch upstream # now "git remote -v show" should look like this: origin firstname.lastname@example.org:YourName/DSpace.git (fetch) origin email@example.com:YourName/DSpace.git (push) upstream firstname.lastname@example.org:DSpace/DSpace.git (fetch) upstream email@example.com:DSpace/DSpace.git (push)
For more information, and a sample git workflow, seesection above.
If you want to test a single pull request, there's two options:
Here's the quick steps for checking out a single PR directly (borrowed from the GitHub instructions listed above)
# Assumes the DSpace/DSpace repo is already setup as your "upstream" repository # [ID] = Pull Request number (at the end of the URL) # [LOCAL-BRANCH] = Name of the branch you want to create (locally) to work on this PR git fetch upstream pull/[ID]/head:[LOCAL-BRANCH] # Now checkout your newly created local branch git checkout [LOCAL-BRANCH] # Optionally, you may want/need to rebase this PR based on the latest code on master git rebase master # You can now do anything you want on this local branch to test or update/change the code # Optionally, if you need to, you can push this local branch back up to your forked repo (origin) # And even create a *new* updated PR from it (via the GitHub UI: https://help.github.com/articles/creating-a-pull-request) git push origin [LOCAL-BRANCH]
If you have added an "upstream" repository to your clone of your fork, here's a handy command to make checking out Pull Requests for testing purposes (inspired by this help page on the GitHub site):
Because of the high number of pull requests, the trick below results in a ~70MB download when you first checkout these pull request branches. Not recommended to do this on a low bandwidth connection.
git config --add remote.upstream.fetch +refs/pull/*/head:refs/remotes/upstream/pr/* # to fetch *all* the pull requests, type this git fetch upstream # and check out a specific PR into a local branch (named whatever you want) git checkout pr/248 -b [LOCAL-BRANCH] # For example, this checks out PR #248 into a local branch. You can name it whatever you want, # but in this example it is using this format: # [JIRA-ticket-number]-[PR-number]-[description] git checkout pr/248 -b "DS-1597-PR-248-test-for-oracle-compatibility" # Now you can test this local branch, update it, etc. as you need to # As necessary, you can also create a *new* PR from it by optionally pushing it up to your forked repo git push origin [LOCAL-BRANCH]
This will be explained on the example of getting a bugfix to both:
Create two separate pull requests, one for the master branch, one for the release branch.
The terminology used here will assume the setup described in Recommended setup of repositories for commiters.
First, make sure you have the release branch (e.g. 'dspace-5_x') set up correctly in your local repo. It needs to be setup to "track" the branch in the "upstream" repo (read above on how to configure an upstream repository)
# confirm you have set upstream correctly git remote -v # should output something like: # origin firstname.lastname@example.org:yourgithubname/DSpace.git (fetch) # origin email@example.com:yourgithubname/DSpace.git (push) # upstream firstname.lastname@example.org:DSpace/DSpace.git (fetch) # upstream email@example.com:DSpace/DSpace.git (push) # then check out the release branch git checkout dspace-5_x # note, the above command may not work for you, you may have to do this instead: # git checkout -b dspace-5_x upstream/dspace-5_x # should output something like: # Branch dspace-5_x set up to track remote branch dspace-5_x from upstream. # Switched to a new branch 'dspace-5_x'
# Make sure you are on your release branch (e.g. dspace-5_x) git checkout dspace-5_x # If you just merged to upstream/master, fetch the list of latest revisions from the upstream repo. not needed if you have the commit anywhere in the local repo. git fetch upstream # This is the hash of the original commit git cherry-pick abc123def456 # check that it's correct git log # Finally, push from your local repo to the upstream repo branch git push upstream dspace-5_x
If you have checked out DSpace 1.8.2 or previous via GitHub, the first time you build DSpace, Maven may error out with a message similar to:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-war-plugin:2.1.1:war (default-war) on project dspace-sword-client-xmlui-webapp: Execution default-war of goal org.apache.maven.plugins:maven-war-plugin:2.1.1:war failed: basedir /dspace-src/dspace-sword-client/dspace-sword-client-xmlui-webapp/src/main/webapp does not exist -> [Help 1]
This error is essentially an artifact of DSpace only supporting SVN in previous releases. Unfortunately, although these "/src/main/webapp/" empty directories existed in SVN, they are ignored by Git/GitHub. This is due to Git's inability to track empty directories.
So, if you run into any error while trying to recompile with
mvn package that a specific "src/main/webapp" directory does not exist, then you will have to create that directory. The DSpace GitHub repository has since fixed this issue (on the latest "master" code and all future releases). But if it affects you, then these are the steps to fix this.
First, create the missing "src/main/webapp" directories. For example, these are the ones missing in Git for DSpace 1.8.x:
mkdir -p dspace-sword-client/dspace-sword-client-xmlui-webapp/src/main/webapp/ mkdir -p dspace/modules/jspui/src/main/webapp mkdir -p dspace/modules/lni/src/main/webapp mkdir -p dspace/modules/oai/src/main/webapp mkdir -p dspace/modules/solr/src/main/webapp mkdir -p dspace/modules/sword/src/main/webapp mkdir -p dspace/modules/swordv2/src/main/webapp mkdir -p dspace/modules/xmlui/src/main/webapp
In each directory, put a place-holder ".gitignore" file, so that Git tracks the directory. For example:
touch dspace-sword-client/dspace-sword-client-xmlui-webapp/src/main/webapp/.gitignore touch dspace/modules/jspui/src/main/webapp/.gitignore touch dspace/modules/lni/src/main/webapp/.gitignore touch dspace/modules/oai/src/main/webapp/.gitignore touch dspace/modules/solr/src/main/webapp/.gitignore touch dspace/modules/sword/src/main/webapp/.gitignore touch dspace/modules/swordv2/src/main/webapp/.gitignore touch dspace/modules/xmlui/src/main/webapp/.gitignore