如何和/或为什么在 Git 中合并比在 SVN 中更好? [英] How and/or why is merging in Git better than in SVN?

查看:17
本文介绍了如何和/或为什么在 Git 中合并比在 SVN 中更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一些地方听说过,分布式版本控制系统大放异彩的主要原因之一是合并比在 SVN 等传统工具中要好得多.这实际上是由于两个系统工作方式的固有差异,还是特定 DVCS 实现(例如 Git/Mercurial)具有比 SVN 更聪明的合并算法?

I've heard in a few places that one of the main reasons why distributed version control systems shine, is much better merging than in traditional tools like SVN. Is this actually due to inherent differences in how the two systems work, or do specific DVCS implementations like Git/Mercurial just have cleverer merging algorithms than SVN?

推荐答案

为什么在 DVCS 中合并比在 Subversion 中更好的说法主要基于不久前在 Subversion 中分支和合并的工作方式.1.5.0 之前的 Subversion 没有'不存储任何关于何时合并分支的信息,因此当你想要合并时,你必须指定必须合并的修订范围.

The claim of why merging is better in a DVCS than in Subversion was largely based on how branching and merge worked in Subversion a while ago. Subversion prior to 1.5.0 didn't store any information about when branches were merged, thus when you wanted to merge you had to specify which range of revisions that had to be merged.

思考这个例子:

      1   2   4     6     8
trunk o-->o-->o---->o---->o
       
           3     5     7
b1       +->o---->o---->o

当我们想要合并 b1 的更改到主干时,我们'd 站在已检出主干的文件夹上,发出以下命令:

When we want to merge b1's changes into the trunk we'd issue the following command, while standing on a folder that has trunk checked out:

svn merge -r 2:7 {link to branch b1}

... 它将尝试将 b1 中的更改合并到您的本地工作目录中.然后在解决任何冲突并测试结果后提交更改.当你提交时,修订树看起来像这样:

… which will attempt to merge the changes from b1 into your local working directory. And then you commit the changes after you resolve any conflicts and tested the result. When you commit the revision tree would look like this:

      1   2   4     6     8   9
trunk o-->o-->o---->o---->o-->o      "the merge commit is at r9"
       
           3     5     7
b1       +->o---->o---->o

然而,当版本树增长时,这种指定修订范围的方式很快就会失控,因为 subversion 没有任何关于何时以及哪些修订合并在一起的元数据.想想以后会发生什么:

However this way of specifying ranges of revisions gets quickly out of hand when the version tree grows as subversion didn't have any meta data on when and what revisions got merged together. Ponder on what happens later:

           12        14
trunk  …-->o-------->o
                                     "Okay, so when did we merge last time?"
              13        15
b1     …----->o-------->o

这主要是 Subversion 存储库设计的一个问题,为了创建一个分支,您需要在存储库中创建一个新的虚拟目录,它将容纳一个主干的副本,但它不存储任何关于何时以及什么东西被合并回来的信息.这有时会导致令人讨厌的合并冲突.更糟糕的是,Subversion 默认使用双向合并,当两个分支头不与它们的共同祖先进行比较时,这在自动合并方面有一些严重的限制.

This is largely an issue by the repository design that Subversion has, in order to create a branch you need to create a new virtual directory in the repository which will house a copy of the trunk but it doesn't store any information regarding when and what things got merged back in. That will lead to nasty merge conflicts at times. What was even worse is that Subversion used two-way merging by default, which has some crippling limitations in automatic merging when two branch heads are not compared with their common ancestor.

为了缓解这种情况,Subversion 现在存储用于分支和合并的元数据.这样就能解决所有问题了吧?

To mitigate this Subversion now stores meta data for branch and merge. That would solve all problems right?

在像 Subversion 这样的集中式系统上,虚拟目录很糟糕.为什么?因为每个人都可以查看它们……甚至是垃圾实验性的.如果您想尝试但不想看到每个人及其阿姨的实验,则分支是很好的.这是严重的认知噪音.你添加的分支越多,你就会看到越多的废话.

On a centralized system, like subversion, virtual directories suck. Why? Because everyone has access to view them… even the garbage experimental ones. Branching is good if you want to experiment but you don't want to see everyones' and their aunts experimentation. This is serious cognitive noise. The more branches you add, the more crap you'll get to see.

存储库中的公共分支越多,跟踪所有不同分支的难度就越大.因此,您将面临的问题是该分支是否仍在开发中,或者它是否真的已经死了,这在任何集中式版本控制系统中都很难判断.

The more public branches you have in a repository the harder it will be to keep track of all the different branches. So the question you'll have is if the branch is still in development or if it is really dead which is hard to tell in any centralized version control system.

大多数情况下,据我所知,组织无论如何都会默认使用一个大分支.这是一种耻辱,因为这反过来将很难跟踪测试和发布版本,而其他任何好处都来自分支.

Most of the time, from what I've seen, an organization will default to use one big branch anyway. Which is a shame because that in turn will be difficult to keep track of testing and release versions, and whatever else good comes from branching.

原因很简单:分支是一流的概念.没有虚拟目录设计和分支是DVCS中的硬对象,它需要这样才能简单地与存储库同步(即push).

There is a very simple reason why: branching is a first-class concept. There are no virtual directories by design and branches are hard objects in DVCS which it needs to be such in order to work simply with synchronization of repositories (i.e. push and pull).

当您使用 DVCS 时,您做的第一件事是克隆存储库(git 的 clone, hg 的 clone 和 bzr 的 branch).克隆在概念上与在版本控制中创建分支相同.有些人称之为forkingbranching(尽管后者通常也用于指代位于同一地点的分支),但其实是一回事.每个用户都运行自己的存储库,这意味着您有一个每个用户的分支.

The first thing you do when you work with a DVCS is to clone repositories (git's clone, hg's clone and bzr's branch). Cloning is conceptually the same thing as creating a branch in version control. Some call this forking or branching (although the latter is often also used to refer to co-located branches), but it's just the same thing. Every user runs their own repository which means you have a per-user branching going on.

版本结构不是树,而是.更具体地说,有向无环图(DAG,意思是没有任何循环的图).除了每个提交都有一个或多个父引用(提交所基于的引用)之外,您真的不需要深入研究 DAG 的细节.因此,下图将因此反向显示修订之间的箭头.

The version structure is not a tree, but rather a graph instead. More specifically a directed acyclic graph (DAG, meaning a graph that doesn't have any cycles). You really don't need to dwell into the specifics of a DAG other than each commit has one or more parent references (which what the commit was based on). So the following graphs will show the arrows between revisions in reverse because of this.

一个非常简单的合并示例是这样的;想象一个名为 origin 的中央存储库和一个用户 Alice,将存储库克隆到她的机器上.

A very simple example of merging would be this; imagine a central repository called origin and a user, Alice, cloning the repository to her machine.

         a…   b…   c…
origin   o<---o<---o
                   ^master
         |
         | clone
         v

         a…   b…   c…
alice    o<---o<---o
                   ^master
                   ^origin/master

在克隆过程中发生的事情是,每个修订版都完全按原样复制给 Alice(由唯一可识别的哈希 ID 验证),并标记原始分支所在的位置.

What happens during a clone is that every revision is copied to Alice exactly as they were (which is validated by the uniquely identifiable hash-id's), and marks where the origin's branches are at.

Alice 然后在她的仓库上工作,在她自己的仓库中提交并决定推送她的更改:

Alice then works on her repo, committing in her own repository and decides to push her changes:

         a…   b…   c…
origin   o<---o<---o
                   ^ master

              "what'll happen after a push?"


         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                   ^origin/master

解决方案相当简单,origin 存储库唯一需要做的就是接收所有新修订并将其分支移动到最新修订(git 称之为快进""):

The solution is rather simple, the only thing that the origin repository needs to do is to take in all the new revisions and move it's branch to the newest revision (which git calls "fast-forward"):

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                             ^origin/master

我在上面说明的用例甚至不需要合并任何东西.所以问题真的不在于合并算法,因为三路合并算法在所有版本控制系统之间几乎相同.问题更在于结构.

The use case, which I illustrated above, doesn't even need to merge anything. So the issue really isn't with merging algorithms since three-way merge algorithm is pretty much the same between all version control systems. The issue is more about structure than anything.

诚然,上面的例子是一个非常简单的用例,所以让我们做一个更扭曲的例子,尽管它更常见​​.还记得 origin 从三个版本开始的吗?好吧,做这些的人,我们称他为 Bob,他一直在自己工作,并在他自己的存储库上进行了提交:

Admittedly the above example is a very simple use case, so lets do a much more twisted one albeit a more common one. Remember that origin started out with three revisions? Well, the guy who did them, lets call him Bob, has been working on his own and made a commit on his own repository:

         a…   b…   c…   f…
bob      o<---o<---o<---o
                        ^ master
                   ^ origin/master

                   "can Bob push his changes?" 

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

现在 Bob 无法将他的更改直接推送到 origin 存储库.系统如何检测这一点是通过检查 Bob 的修订是否直接从 origin 下降,在这种情况下不是.任何推送的尝试都会导致系统说出类似于呃...我恐怕可以不要让你这样做鲍勃."

Now Bob can't push his changes directly to the origin repository. How the system detects this is by checking if Bob's revisions directly descents from origin's, which in this case doesn't. Any attempt to push will result into the system saying something akin to "Uh... I'm afraid can't let you do that Bob."

所以 Bob 必须拉入然后合并更改(使用 git 的 pull; 或 hg 的 pullmerge; 或 bzr 的 <代码>合并).这是一个两步过程.首先,Bob 必须获取新的修订版本,这将从 origin 存储库中按原样复制它们.我们现在可以看到图形发散了:

So Bob has to pull-in and then merge the changes (with git's pull; or hg's pull and merge; or bzr's merge). This is a two-step process. First Bob has to fetch the new revisions, which will copy them as they are from the origin repository. We can now see that the graph diverges:

                        v master
         a…   b…   c…   f…
bob      o<---o<---o<---o
                   ^
                   |    d…   e…
                   +----o<---o
                             ^ origin/master

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

拉取过程的第二步是合并发散的tips并提交结果:

The second step of the pull process is to merge the diverging tips and make a commit of the result:

                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+
                             ^ origin/master

希望合并不会遇到冲突(如果您预料到它们,您可以在 git 中使用 fetchmerge).稍后需要做的是将这些更改再次推送到 origin,这将导致快进合并,因为合并提交是 origin<中最新提交的直接后代/code> 存储库:

Hopefully the merge won't run into conflicts (if you anticipate them you can do the two steps manually in git with fetch and merge). What later needs to be done is to push in those changes again to origin, which will result into a fast-forward merge since the merge commit is a direct descendant of the latest in the origin repository:

                                 v origin/master
                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

                                 v master
         a…   b…   c…   f…       1…
origin   o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

还有另一个合并 git 和 hg 的选项,称为 rebase,它将把 Bob 的更改移到最新更改之后.因为我不想让这个答案变得更冗长,我会让你阅读 gitmercurialbazaar 相关的文档.

There is another option to merge in git and hg, called rebase, which'll move Bob's changes to after the newest changes. Since I don't want this answer to be any more verbose I'll let you read the git, mercurial or bazaar docs about that instead.

作为给读者的练习,请尝试绘制出与其他用户一起使用的方法.与上面使用 Bob 的示例类似.代码库之间的合并比您想象的要容易,因为所有修订/提交都是唯一可识别的.

As an exercise for the reader, try drawing out how it'll work out with another user involved. It is similarly done as the example above with Bob. Merging between repositories is easier than what you'd think because all the revisions/commits are uniquely identifiable.

还有在每个开发人员之间发送补丁的问题,这在 Subversion 中是一个巨大的问题,在 git、hg 和 bzr 中通过唯一可识别的修订来缓解.一旦有人合并了他的更改(即进行了合并提交)并将其发送给团队中的其他人以通过推送到中央存储库或发送补丁来使用,那么他们就不必担心合并,因为它已经发生了.Martin Fowler 称这种工作方式为混杂集成.

There is also the issue of sending patches between each developer, that was a huge problem in Subversion which is mitigated in git, hg and bzr by uniquely identifiable revisions. Once someone has merged his changes (i.e. made a merge commit) and sends it for everyone else in the team to consume by either pushing to a central repository or sending patches then they don't have to worry about the merge, because it already happened. Martin Fowler calls this way of working promiscuous integration.

由于其结构与 Subversion 不同,因此通过使用 DAG,它使分支和合并能够以更简单的方式完成,不仅对系统而且对用户而言.

Because the structure is different from Subversion, by instead employing a DAG, it enables branching and merging to be done in an easier manner not only for the system but for the user as well.

这篇关于如何和/或为什么在 Git 中合并比在 SVN 中更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆