如何和/或为什么Git合并比SVN更好? [英] How and/or why is merging in Git better than in SVN?

查看:70
本文介绍了如何和/或为什么Git合并比SVN更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾在几个地方听说分布式版本控制系统发光的主要原因之一,与SVN等传统工具相比,要好得多。
这实际上是由于两个系统工作方式的固有差异,还是特定 DVCS实现像Git / Mercurial只具有比SVN更聪明的合并算法?

解决方案

在DVCS中合并的原因比Subversion更好,主要基于前一段时间Subversion中的分支和合并。 Subversion在 1.5.0 之前没有存储任何有关分支合并的信息,因此,当你想合并时,你必须指定合并的修订版本范围。所以为什么要这样做? Subversion合并 suck

仔细考虑这个例子:

  1 2 4 6 8 
躯干o - > o - > o ----> o ----> o
\
\\ \\ 3 5 7
b1 + - > o ---->> o ----> o

当我们想合并时b1的变化进入主干后,我们会发出以下命令,而站在一个已检出trunk的文件夹中:

  svn merge -r 2:7 {link to branch b1} 

...它将尝试合并从 b1 更改到您的本地工作目录。然后在解决任何冲突并测试结果后再提交更改。当您提交修订树时,会看起来像这样:

  1 2 4 6 8 9 
trunk o - > ; o - > o ----> o ----> o - > o合并提交在r9
\
\ 3 5 7
b1 + - > o ----> o ----> o



<但是,当版本树增长时,这种指定版本范围的方式很快就会失控,因为Subversion没有关于何时和何种版本合并在一起的元数据。以后会发生什么:

  12 14 
trunk ... - > o ------- - > o
好的,那么我们什么时候最后一次合并?
13 15
b1 ... -----> o --------> o

这是Subversion存储库设计的一个主要问题,为了创建一个分支,您需要在存储库中创建一个新的虚拟目录一个主干的副本,但它不存储关于何时和什么事合并回来的任何信息。这将导致有时令人讨厌的合并冲突。更糟糕的是,Subversion默认使用了双向合并,当两个分支头与它们的共同祖先不相比时,它在自动合并中存在一些严重的限制。



为了缓解这个问题,Subversion现在为分支和合并存储元数据。这会解决所有问题吗?



顺便说一句,Subversion仍然很糟糕......



集中式系统,如subversion,虚拟目录吸引。为什么?因为每个人都可以查看它们,甚至是垃圾实验的。如果你想尝试,但你不想看到每个人和他们的阿姨试验,分支是很好的。这是严重的认知噪音。您添加的分支越多,您就会看到更多的垃圾。



您在存储库中拥有的公共分支越多,难以跟踪所有不同的分支。因此,您将面临的问题是,如果分支仍处于开发阶段,或者它真的死了,而这在任何集中式版本控制系统中都很难说明。

大部分从我所看到的时间来看,一个组织无论如何都会默认使用一个大分支。这是一个耻辱,因为这反过来将难以跟踪测试和发布版本,而其他任何好的东西都来自分支。所以为什么DVCS,比如Git,Mercurial和Bazaar在分支和合并方面比Subversion更好?

有一个非常简单的原因:分支是一流的概念。根据设计,没有虚拟目录,并且分支是DVCS中的硬对象,为了简单地与存储库同步(即 push 在使用DVCS时你要做的第一件事就是克隆版本库(git的 )。 kernel.org/pub/software/scm/git/docs/git-clone.htmlrel =nofollow noreferrer> clone ,hg's clone 和bzr的 分支 )。克隆在概念上与在版本控制中创建分支是一回事。有些人称之为分叉分支(尽管后者通常也用于引用同位分支),但它们也是一样的。每个用户都运行他们自己的仓库,这意味着您的每个用户都有一个分支。



版本结构不是树,而是改为。更具体地说,定向非循环图(DAG,意思是没有任何循环的图) 。除了每个提交有一个或多个父引用(提交所基于的内容)之外,您实际上不需要深入研究DAG的具体细节。因此,下面的图表会显示相反版本之间的箭头。



合并的一个非常简单的例子就是这个;设想一个名为 origin 的中央资源库和一个用户Alice,将资源库克隆到她的机器中。

  a ... b ... c ... 
原点o< --- o< --- o
^ master
|
|克隆
v

a ... b ... c ...
alice o< --- o< --- o
^ master
^ origin / master

克隆过程中发生的情况是,每个修订版都完全按照原样复制到Alice上(这由并且标记原始分支在哪里。



Alice然后在她的repo上工作,在她自己的仓库中提交并决定推进她的更改:

  a ... b ... c ... 
原点o < - o < - o
^ master

推送后会发生什么?


a ... b ... c ... d ... e ...
alice o< --- o< --- o< --- o< --- o
^ master
^ origin / master

解决方法相当简单,唯一 origin 存储库需要做的是接受所有新版本并将其分支移动到最新版本(git称为快进):

  a ... b ... c ... d ... e ... 
原点o< --- o< --- o< --- o <--- o
^ master

a ... b ... c ... d ... e ...
alice o <--- o <--- o <--- o < ; --- o
^ master
^ origin / master

使用案例,我在上面解释过,甚至不需要合并任何东西。所以这个问题实际上不是合并算法,因为三路合并算法在所有版本控制系统之间几乎是相同的。 这个问题更多的是关于结构而非



那你怎么样给我看一个有 合并?



无可否认,上面的例子是一个非常简单的用例,所以让我们做一个更扭曲的例子,尽管它更常见​​。请记住, origin 开始时有三个修订版本吗?那么,做过他们的人可以称他为鲍勃,他一直在自己的工作,并在他自己的存储库上做了一个提交:

  a ... b ... c ... f ... 
bob o < - o < - o < - o
^ master
^原点/ master

Bob可以推送他的更改吗?

a ... b ... c ... d ... e ...
origin o < - o < - o < - o < - o
^ master

现在Bob无法将他的更改直接推送到原点存储库。系统如何检测这一点是通过检查鲍勃的修订是否直接从 origin 的下降,在这种情况下不是。任何尝试推送都会导致系统发出类似于呃......我很害怕不能让你这么做Bob。



所以Bob必须引入并合并所有的变化(使用git的 pull ;或者hg的 pull merge ;或bzr的 合并 )。这是一个两步过程。第一个Bob必须获取新的修订版本,它们将从 origin 存储库中复制它们。我们现在可以看到图形发散:

  v master 
a ... b ... c ... f ...
bob o< --- o< --- o< --- o
^
| d ... e ...
+ ---- o <--- o
^ origin / master

a ... b ... c ... d ... e ...
原点o< ; --- o< --- o< --- o< --- o
^ master

pull过程的第二步是合并分离的提示并提交结果:

  v master 
a ... b ... c ... f ... 1 ...
bob o <--- o <--- o <--- o <------- o
^ |
| d ... e ... |
+ ---- o <--- o < - +
^ origin / master

希望合并不会发生冲突(如果您预计他们可以通过 fetch merge )。之后需要做的事情是将这些更改再次推入到 origin 中,这将导致快速合并,因为合并提交是最新版本的直接后代来源存储库:

  v原点/主
v主
a ... b ... c ... f ... 1 ...
bob o< --- o< --- o< --- o< ------- o
^ |
| d ... e ... |
+ o o< --- o< - +

v master
a ... b ... c ... f ... 1 ...
origin 0< - -o <--- o <--- o <------- o
^ |
| d ... e ... |
+ ---- o <--- o < - +

另一个选项是在git和hg中进行合并,称为 rebase ,它将在最新的更改之后将Bob的更改移至此处。由于我不想让这个答案变得更加冗长,我会让你阅读 git mercurial bazaar docs about the instead。



作为读者的练习,请尝试绘制如何与其他涉及的用户合作。这与上面鲍勃的例子类似。合并仓库之间的比较容易,因为所有的修订/提交都是唯一可识别的。

每个开发者之间也有发送补丁的问题,那就是在Subversion中一个巨大的问题,在git,hg和bzr中通过唯一可识别的修订缓解。一旦有人合并了他的更改(即进行了合并提交),并将其发送给团队中的其他人使用,方法是推送到中央存储库或发送补丁,然后他们不必担心合并,因为它已经发生。 Martin Fowler称这种工作方式为混杂整合

由于结构与Subversion不同,通过使用DAG,它使分支和合并能够以更简单的方式完成,不仅是系统,也是用户。


I've heard in a few places that one of the main reasons why distributed version control systems shine, is much better merging than in traditional tools like SVN. Is this actually due to inherent differences in how the two systems work, or do specific DVCS implementations like Git/Mercurial just have cleverer merging algorithms than SVN?

解决方案

The claim of why merging is better in a DVCS than in Subversion was largely based on how branching and merge worked in Subversion a while ago. Subversion prior to 1.5.0 didn't store any information about when branches were merged, thus when you wanted to merge you had to specify which range of revisions that had to be merged.

So why did Subversion merges suck?

Ponder this example:

      1   2   4     6     8
trunk o-->o-->o---->o---->o
       \
        \   3     5     7
b1       +->o---->o---->o

When we want to merge b1's changes into the trunk we'd issue the following command, while standing on a folder that has trunk checked out:

svn merge -r 2:7 {link to branch b1}

… which will attempt to merge the changes from b1 into your local working directory. And then you commit the changes after you resolve any conflicts and tested the result. When you commit the revision tree would look like this:

      1   2   4     6     8   9
trunk o-->o-->o---->o---->o-->o      "the merge commit is at r9"
       \
        \   3     5     7
b1       +->o---->o---->o

However this way of specifying ranges of revisions gets quickly out of hand when the version tree grows as subversion didn't have any meta data on when and what revisions got merged together. Ponder on what happens later:

           12        14
trunk  …-->o-------->o
                                     "Okay, so when did we merge last time?"
              13        15
b1     …----->o-------->o

This is largely an issue by the repository design that Subversion has, in order to create a branch you need to create a new virtual directory in the repository which will house a copy of the trunk but it doesn't store any information regarding when and what things got merged back in. That will lead to nasty merge conflicts at times. What was even worse is that Subversion used two-way merging by default, which has some crippling limitations in automatic merging when two branch heads are not compared with their common ancestor.

To mitigate this Subversion now stores meta data for branch and merge. That would solve all problems right?

And oh, by the way, Subversion still sucks…

On a centralized system, like subversion, virtual directories suck. Why? Because everyone has access to view them… even the garbage experimental ones. Branching is good if you want to experiment but you don't want to see everyones' and their aunts experimentation. This is serious cognitive noise. The more branches you add, the more crap you'll get to see.

The more public branches you have in a repository the harder it will be to keep track of all the different branches. So the question you'll have is if the branch is still in development or if it is really dead which is hard to tell in any centralized version control system.

Most of the time, from what I've seen, an organization will default to use one big branch anyway. Which is a shame because that in turn will be difficult to keep track of testing and release versions, and whatever else good comes from branching.

So why are DVCS, such as Git, Mercurial and Bazaar, better than Subversion at branching and merging?

There is a very simple reason why: branching is a first-class concept. There are no virtual directories by design and branches are hard objects in DVCS which it needs to be such in order to work simply with synchronization of repositories (i.e. push and pull).

The first thing you do when you work with a DVCS is to clone repositories (git's clone, hg's clone and bzr's branch). Cloning is conceptually the same thing as creating a branch in version control. Some call this forking or branching (although the latter is often also used to refer to co-located branches), but it's just the same thing. Every user runs their own repository which means you have a per-user branching going on.

The version structure is not a tree, but rather a graph instead. More specifically a directed acyclic graph (DAG, meaning a graph that doesn't have any cycles). You really don't need to dwell into the specifics of a DAG other than each commit has one or more parent references (which what the commit was based on). So the following graphs will show the arrows between revisions in reverse because of this.

A very simple example of merging would be this; imagine a central repository called origin and a user, Alice, cloning the repository to her machine.

         a…   b…   c…
origin   o<---o<---o
                   ^master
         |
         | clone
         v

         a…   b…   c…
alice    o<---o<---o
                   ^master
                   ^origin/master

What happens during a clone is that every revision is copied to Alice exactly as they were (which is validated by the uniquely identifiable hash-id's), and marks where the origin's branches are at.

Alice then works on her repo, committing in her own repository and decides to push her changes:

         a…   b…   c…
origin   o<---o<---o
                   ^ master

              "what'll happen after a push?"


         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                   ^origin/master

The solution is rather simple, the only thing that the origin repository needs to do is to take in all the new revisions and move it's branch to the newest revision (which git calls "fast-forward"):

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                             ^origin/master

The use case, which I illustrated above, doesn't even need to merge anything. So the issue really isn't with merging algorithms since three-way merge algorithm is pretty much the same between all version control systems. The issue is more about structure than anything.

So how about you show me an example that has a real merge?

Admittedly the above example is a very simple use case, so lets do a much more twisted one albeit a more common one. Remember that origin started out with three revisions? Well, the guy who did them, lets call him Bob, has been working on his own and made a commit on his own repository:

         a…   b…   c…   f…
bob      o<---o<---o<---o
                        ^ master
                   ^ origin/master

                   "can Bob push his changes?" 

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

Now Bob can't push his changes directly to the origin repository. How the system detects this is by checking if Bob's revisions directly descents from origin's, which in this case doesn't. Any attempt to push will result into the system saying something akin to "Uh... I'm afraid can't let you do that Bob."

So Bob has to pull-in and then merge the changes (with git's pull; or hg's pull and merge; or bzr's merge). This is a two-step process. First Bob has to fetch the new revisions, which will copy them as they are from the origin repository. We can now see that the graph diverges:

                        v master
         a…   b…   c…   f…
bob      o<---o<---o<---o
                   ^
                   |    d…   e…
                   +----o<---o
                             ^ origin/master

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

The second step of the pull process is to merge the diverging tips and make a commit of the result:

                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+
                             ^ origin/master

Hopefully the merge won't run into conflicts (if you anticipate them you can do the two steps manually in git with fetch and merge). What later needs to be done is to push in those changes again to origin, which will result into a fast-forward merge since the merge commit is a direct descendant of the latest in the origin repository:

                                 v origin/master
                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

                                 v master
         a…   b…   c…   f…       1…
origin   o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

There is another option to merge in git and hg, called rebase, which'll move Bob's changes to after the newest changes. Since I don't want this answer to be any more verbose I'll let you read the git, mercurial or bazaar docs about that instead.

As an exercise for the reader, try drawing out how it'll work out with another user involved. It is similarly done as the example above with Bob. Merging between repositories is easier than what you'd think because all the revisions/commits are uniquely identifiable.

There is also the issue of sending patches between each developer, that was a huge problem in Subversion which is mitigated in git, hg and bzr by uniquely identifiable revisions. Once someone has merged his changes (i.e. made a merge commit) and sends it for everyone else in the team to consume by either pushing to a central repository or sending patches then they don't have to worry about the merge, because it already happened. Martin Fowler calls this way of working promiscuous integration.

Because the structure is different from Subversion, by instead employing a DAG, it enables branching and merging to be done in an easier manner not only for the system but for the user as well.

这篇关于如何和/或为什么Git合并比SVN更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆