It's Time to Fix Subversion Merge

Posted by Andy Singleton on Jul 11, 2011 11:35:00 AM

In my article Why Subversion does not Suck, I argued that Subversion fills a need for simplicity and team synchronization.  It also has an advantage handling large files and repositories.  However, there is one thing about Subversion that definitely sucks:  Merging.  It's time to fix Subversion merge.

Use Cases

Many developers move from Subversion to other SCM systems with better merge capabilities, because modern programming workflows require a lot of merges.

Often, teams work with parallel branches or builds, as in these examples:

  • When code is ready to release, the developers make a release branch.  Active development continues on the main branch, and bug fixes are merged between the two branches as needed.
  • A custom version of the software is maintained as a branch or a copy/fork/clone, with updates and improvements merged from the main branch or repository.
  • A development process has builds for each stage of a test-and-release pipeline: development, testing, stabilization and localization, and release.  Changes get passed from one stage to the next.

Subversion can handle the simple parallel case of a development trunk with mostly static release branches. Subversion teams do not use a multi-stage pipeline, because merging to each stage is too much work.

Users of other SCM systems employ many workflows that involve passing changes up a hierarchy, including:

  • Open source and team workflows where each "contributor" makes a copy of the code (a branch, clone, or fork), and passes changes or merge requests up to a "maintainer", who merges them and tests them.   This process has revolutionized open source development because anyone can propose a contribution.
  • Feature branches that hold systemic changes that might disrupt existing features.
  • Code reviews that promote changes from a developer or review branch into a trunk or master.
  • Big systems that are assembled by merging contributions from many teams up a hierarchy.

Subversion 1.6 can handle the simplest case of hierarchical development, with feature branches feeding a trunk.  However, many teams avoid this structure because of problems merging back and forth.

A more effective merge will eliminate frustration in existing Subversion projects, and it will give Subversion users access to all of these parallel and hierarchical workflows, greatly increasing productivity.

How It Should Work

We want to be able to say:

merge <source>

This merges any new changes from the source into our working copy, as automatically as possible.  Source can be any branch or foreign repository, but usually one that has a common ancestor.  It should recognize changes that we already merged, even if they traveled through several other branches and repositories before coming back to us.

We also want to be able to cherrypick:

merge <source> <specific changes> 

My proposal is to enhance the client with a new merge command - let's call it "NewMerge".  Our NewMerge command will be simplified to avoid some cases that cause problems and confusion.

  • We don't select changes between two versions, or "reintegrate", or specify "depth".  Those instructions become unecessary.

The existing svn merge implementation has a lot of extra complexity (with reliability and performance problems) because it tracks merges on subtrees.  This seems silly because the documentation recommends you avoid doing it.  If you want to work on a subtree, you can make a new branch or repository that contains the subtree.  We can track its relationship and merge it back to the complete tree.

  • Merging is always on the entire branch
  • There is no merge_history / merginfo on individual files, or on subdirectories.  Merge history is only tracked at the root of a branch.
  • No merging to mixed-revision working copies

NewMerge will be extended to handle the case of code being passed between multiple repositories.  This is a common case in modern workflows, with "clones" in addition to branches.  Svn merge aready supports "foreign merge", but we need to build in merge tracking.

NewMerge Architecture

Merging is done on the client side, where a user can resolve merge conflicts.  The merge will work smoothly if we have a smart enough client, that has access to enough information.  So, we need a modified client, and we need to give it more information.

We can use the same information strategy that is built into Subversion now, which is to track the changes that are in our branch, and the branch we are merging from.  We don't have the complete content of each revision on the client, but we can get lists of changes.  Then, we can decide which changes to apply to our working copy.

We need to do a good job tracking those changes.  The existing "merginfo" is not adequate.  It does not work for merging between different repositories.  It also gets confused by the hierarchical case, and by file moves.

NewMerge keeps a more detailed history of what it merged.  It will keep this history in a new file, which I will cleverly call "merge_history".  This is a file that goes into the root of each branch and gets committed along with the results of the merge.

NewMerge is backward compatible with existing servers and clients.  NewMerge can save its history in existing servers.  Anyone who wants to merge in a NewMerge system should use a NewMerge client, BUT most Subversion users don't merge.  95% of Subversion users just commit and update from one branch.  These users can get away with using the old clients.

Merge algorithms are complicated, and there are many different merge algorithms that can yield different or improved results.  We have open source software so that clever programmers can improve these algorithms.  NewMerge should be documented so that contributors can easily modify or extend it.  The advantage of NewMerge will come from continuous improvement, rather than from the initial release.

Implementation

merge_history

We will put merge history in a file that goes into the root of the branch.  The old Mergeinfo goes into a property, but I recommend a file because files are stored incrementally in each revision, and they can be bigger.  We want room to work.

We will refer to revisions by GUID (server ID + Revision), and not just by revision.  This allows us to track changes from foreign repositories.

When we make a new branch or clone a repository, we copy the merge history from the source branch, and add any tree changes, such as making a branch out of a subdirectory.

The merge history data structure should be extensible with something like key-value pairs or JSON, so that anyone who is making improvements to NewMerge can add to it.

We can fix some specific problems with the extensible data structure.  Tree change operations (file moves) seem to cause problems.  If we need to, we can keep more information about the tree changes in the merge history file.  Do we need to save the diff that describes the edits to resolve any merge conflicts?  Do we need to save the list of the files that were changed in each revision that went into a merge?  What other information is required for doing good merges?

Prototype

We will need a prototype of NewMerge.  Then we can document it as a community project and start improving it. I think that we can start with the existing merge implementation.

  • Apply the restriction that we only merge and track merges on complete branches.  This will immediately make it simpler and more reliable.
  • Remove the extra code for dealing with subtree merginfo
  • Move the merginfo into the new merge_history file and extensible data structure
  • Fix the simple case of "reintegrate" from a feature branch to a trunk with multiple iterations. The feature developer should be able to say "newmerge <from trunk>" at any time to get updates, and the trunk should be able to say "newmerge <from branch>" at any time to try the new feature.  Currently, this only works if both sides provide special instructions.
  • Add the GUID tracking of revisions and merges from foreign repositories.

Many thanks to the WANdisco team, including Subversion contributors Hyram Wright, Philip Martin, and Julian Foad.  They have shaped this project and will participate.  I look forward to working with you on this, and I will be excited when Assembla supports all of the new coding workflows for Subversion.

Check out our implementation of Subversion Merge Requests for free with Assembla Renzoku.

Get The World’s Best Subversion Hosting Here!

Topics: development process, repositories, subversion

Written by Andy Singleton

Working on Continuous Agile and Accelerating Innovation, Assembla CEO and startup founder

Follow Assembla

Get Started

blog-CTA-button

Subscribe to Email Updates