Current Articles | RSS Feed RSS Feed

It's Time to Fix Subversion Merge

Posted by Andy Singleton on Mon, Jul 11, 2011
  
  

In my article Why Subversion does not Suck, I argued that Subversion fills a need for simplicity and team synchronization.  It also has an advantage handling large files and repositories.  However, there is one thing about Subversion that definitely sucks:  Merging.  It's time to fix Subversion merge.

Use Cases

Many developers move from Subversion to other SCM systems with better merge capabilities, because modern programming workflows require a lot of merges.

Often, teams work with parallel branches or builds, as in these examples:

  • When code is ready to release, the developers make a release branch.  Active development continues on the main branch, and bug fixes are merged between the two branches as needed.
  • A custom version of the software is maintained as a branch or a copy/fork/clone, with updates and improvements merged from the main branch or repository.
  • A development process has builds for each stage of a test-and-release pipeline: development, testing, stabilization and localization, and release.  Changes get passed from one stage to the next.

Subversion can handle the simple parallel case of a development trunk with mostly static release branches. Subversion teams do not use a multi-stage pipeline, because merging to each stage is too much work.

Users of other SCM systems employ many workflows that involve passing changes up a hierarchy, including:

  • Open source and team workflows where each "contributor" makes a copy of the code (a branch, clone, or fork), and passes changes or merge requests up to a "maintainer", who merges them and tests them.   This process has revolutionized open source development because anyone can propose a contribution.
  • Feature branches that hold systemic changes that might disrupt existing features.
  • Code reviews that promote changes from a developer or review branch into a trunk or master.
  • Big systems that are assembled by merging contributions from many teams up a hierarchy.

Subversion 1.6 can handle the simplest case of hierarchical development, with feature branches feeding a trunk.  However, many teams avoid this structure because of problems merging back and forth.

A more effective merge will eliminate frustration in existing Subversion projects, and it will give Subversion users access to all of these parallel and hierarchical workflows, greatly increasing productivity.

How It Should Work

We want to be able to say:

merge <source>

This merges any new changes from the source into our working copy, as automatically as possible.  Source can be any branch or foreign repository, but usually one that has a common ancestor.  It should recognize changes that we already merged, even if they traveled through several other branches and repositories before coming back to us.

We also want to be able to cherrypick:

merge <source> <specific changes> 

My proposal is to enhance the client with a new merge command - let's call it "NewMerge".  Our NewMerge command will be simplified to avoid some cases that cause problems and confusion.

  • We don't select changes between two versions, or "reintegrate", or specify "depth".  Those instructions become unecessary.

The existing svn merge implementation has a lot of extra complexity (with reliability and performance problems) because it tracks merges on subtrees.  This seems silly because the documentation recommends you avoid doing it.  If you want to work on a subtree, you can make a new branch or repository that contains the subtree.  We can track its relationship and merge it back to the complete tree.

  • Merging is always on the entire branch
  • There is no merge_history / merginfo on individual files, or on subdirectories.  Merge history is only tracked at the root of a branch.
  • No merging to mixed-revision working copies

NewMerge will be extended to handle the case of code being passed between multiple repositories.  This is a common case in modern workflows, with "clones" in addition to branches.  Svn merge aready supports "foreign merge", but we need to build in merge tracking.

NewMerge Architecture

Merging is done on the client side, where a user can resolve merge conflicts.  The merge will work smoothly if we have a smart enough client, that has access to enough information.  So, we need a modified client, and we need to give it more information.

We can use the same information strategy that is built into Subversion now, which is to track the changes that are in our branch, and the branch we are merging from.  We don't have the complete content of each revision on the client, but we can get lists of changes.  Then, we can decide which changes to apply to our working copy.

We need to do a good job tracking those changes.  The existing "merginfo" is not adequate.  It does not work for merging between different repositories.  It also gets confused by the hierarchical case, and by file moves.

NewMerge keeps a more detailed history of what it merged.  It will keep this history in a new file, which I will cleverly call "merge_history".  This is a file that goes into the root of each branch and gets committed along with the results of the merge.

NewMerge is backward compatible with existing servers and clients.  NewMerge can save its history in existing servers.  Anyone who wants to merge in a NewMerge system should use a NewMerge client, BUT most Subversion users don't merge.  95% of Subversion users just commit and update from one branch.  These users can get away with using the old clients.

Merge algorithms are complicated, and there are many different merge algorithms that can yield different or improved results.  We have open source software so that clever programmers can improve these algorithms.  NewMerge should be documented so that contributors can easily modify or extend it.  The advantage of NewMerge will come from continuous improvement, rather than from the initial release.

Implementation

merge_history

We will put merge history in a file that goes into the root of the branch.  The old Mergeinfo goes into a property, but I recommend a file because files are stored incrementally in each revision, and they can be bigger.  We want room to work.

We will refer to revisions by GUID (server ID + Revision), and not just by revision.  This allows us to track changes from foreign repositories.

When we make a new branch or clone a repository, we copy the merge history from the source branch, and add any tree changes, such as making a branch out of a subdirectory.

The merge history data structure should be extensible with something like key-value pairs or JSON, so that anyone who is making improvements to NewMerge can add to it.

We can fix some specific problems with the extensible data structure.  Tree change operations (file moves) seem to cause problems.  If we need to, we can keep more information about the tree changes in the merge history file.  Do we need to save the diff that describes the edits to resolve any merge conflicts?  Do we need to save the list of the files that were changed in each revision that went into a merge?  What other information is required for doing good merges?

Prototype

We will need a prototype of NewMerge.  Then we can document it as a community project and start improving it. I think that we can start with the existing merge implementation.

  • Apply the restriction that we only merge and track merges on complete branches.  This will immediately make it simpler and more reliable.
  • Remove the extra code for dealing with subtree merginfo
  • Move the merginfo into the new merge_history file and extensible data structure
  • Fix the simple case of "reintegrate" from a feature branch to a trunk with multiple iterations. The feature developer should be able to say "newmerge <from trunk>" at any time to get updates, and the trunk should be able to say "newmerge <from branch>" at any time to try the new feature.  Currently, this only works if both sides provide special instructions.
  • Add the GUID tracking of revisions and merges from foreign repositories.

Many thanks to the WANdisco team, including Subversion contributors Hyram Wright, Philip Martin, and Julian Foad.  They have shaped this project and will participate.  I look forward to working with you on this, and I will be excited when Assembla supports all of the new coding workflows for Subversion.

Check out our implementation of Subversion Merge Requests for free with Assembla Renzoku.

Tags: , ,

COMMENTS

Use GIT duh

posted @ Tuesday, July 12, 2011 4:16 PM by Wes McClure


Every time I hear "svn merge" I feel pain. Therefore suggest you not to name it "svn newmerge". Name it "git merge".

posted @ Thursday, July 28, 2011 7:37 AM by cail


Offtopic: 
Why bother? Use git! 
 
Ontopic: 
My biggest problem with merging in subversion: If you merge a lot of changes at once and commit that, in the branch it looks like one change. This obfuscates the hell out of blame which is nothing serious of course, because blame is not oftenly used, but when you're blaming it may be hell if you've merged a lot, because all the lines would be tagged with the comment: Merged stuff from feature branch x. And then you've cleaned up old feature branches and have to do a lot of hassle before you find the log in which you've made an interesting coding desicion without commenting this enough... 
 
BTW: git doesn't suffer from this history loss on merge problem ;)

posted @ Thursday, July 28, 2011 7:40 AM by Jaap


Guys, whether you like it or not, lots of people are still using Subversion and will do so for quite some time. 
 
So even if Git's merge is vastly superior, "just use git" isn't exactly helpful in this context.

posted @ Thursday, July 28, 2011 8:17 AM by sapporo


We use savana: http://savana.codehaus.org/gettingstarted.html 
 
It solves problems with merges and reviews: everyone works in his private branch, synced with trunk, until he is ready to promote changes to trunk. 

posted @ Thursday, July 28, 2011 8:37 AM by Volodymyr Lisivka


Copy trunk 
svn cp -m "branch from trunk" ^/trunk ^/branch/x 
svn switch ^/branch/x 
 
Do the changes on the branch. If required catch-up any changes from trunk. 
 
svn merge ^/trunk 
 
When you want to merge branch back with trunk, switch to trunk and reintegrate. 
 
svn switch ^/trunk 
svn merge --reintegrate ^/branch/x 
 
Commit and done. 

posted @ Thursday, July 28, 2011 8:40 AM by merger


I agree that git merge works much better. That is why we use git for Assembla development. It is also why the Assembla git tool supports merge-intensive git workflows such as fork and merge, and Gerrit code review branches. 
 
However, a LOT of Assembla customers use Subversion. A lot. And, they use it for valid reasons. For example, some of them work with designers who aren't programmers and have big files. That's a pretty decent line of business for Assembla. We have people call in looking for hundreds of gigabytes, which Subversion can give them. Others find that it fits their programming workflow. 
 
Given the high demand for Subversion here at Assembla, we took a new look at it and realized that there are many ways that we can innovate and refresh Subversion. We've started projects to: 
* Improve and refine our git workflow. That's the foundation. 
* Fix Subversion merge 
* Make Subversion even more accessible with an automatic replication mode. So, it will be useful for file sharing and backup, even for non-programmers. We're working with Tortoise project members on that. 
* Use the mergeg new capabilities to offer clone, review, and merge workflows in Subversion. 
 
Let me provide an example of how this could affect the type of big companies that use a lot of Subversion. They often put all of their production code, for multiple projects, in one big svn repository. When we are finished our upgrades, they will be able to branch or clone any subtree for active development. So, they will have a map of their active development projects, and each of those active projects will be able to use the workflow that is most effective.

posted @ Thursday, July 28, 2011 8:47 AM by Andy Singleton


There are quite a number of variables here. The first is "what constitutes a 'valid' reason." Your assertion is that non-programmers with big files can't use git (for some reason). Yet the assertion that it's perfectly reasonable to expect them to perform merges in a source control system goes unchallenged. Perhaps they don't need a source control system at all, but rather an asset-management system. Perhaps you wish to build them one using Subversion as its underlying data store (I wish you luck). I'd be a bit suspicious, however, of any solution that starts out by trying to "fix" the way Subversion's merge strategy works. As previous commenters have pointed out, as long as you're going to build a bunch of software to abstract away details from the end-user, why not simplify your problem and use a system whose support for merges is first-rate. Given that you're all well-versed in git workflows it should be an easier problem to solve. Also bear in mind that one can easily import from Subversion to Git (or Mercurial, etc, etc).  
 
That said, if you're truly committed to your approach, I wish you the best of luck. Should you be successful, lots of people will be very happy.

posted @ Thursday, July 28, 2011 9:05 AM by Christian Romney


Fixing svn is a lot of work. You could instead write a wrapper around git-svn to make it look like svn. The wrapper can enforce a centralized workflow. The user should not be able to tell the difference.

posted @ Thursday, July 28, 2011 9:43 AM by My name


Comments have been closed for this article.

Follow Assembla

twitter facebook youtube linkedin googleplus

Get Started

blog CTA button

Subscribe by Email

Your email: