GIT中的重复合并.它如何计算差异? [英] Repetitive merges in GIT. How does it calculate differences?

查看:93
本文介绍了GIT中的重复合并.它如何计算差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究试图理解GIT合并是如何工作的.我知道有几种合并类型,例如递归,章鱼等.我发现解析/递归最常用.而且只有在有几个共同的祖先/基数时,递归合并才有用.

但是,我找不到使用哪种算法(或应如何计算祖先)与分支中的主对象重复合并.

一个简单的例子.让我们用1个文件"A"创建一个空项目:

A

然后创建另一个文件"B"并提交给主文件

A
B

然后我从第一个版本创建一个分支,该分支只有一个文件"A",并创建另一个文件"C".所以我的分支看起来像这样:

A
C

然后我决定将分支更改合并到master,然后得到:

A
B
C

然后,我决定返回分支机构并从那里继续工作.我创建另一个文件"D"

A
C
D

现在,我想将我的更改从分支合并回主干.祖先是如何计算的?

一个直观的例子:

如果我采用祖先"AC",则应该说"B"也是一个新的加法,因为它在两个版本中都不存在:分支和祖先.

如果我采用祖先"ABC",则应该说"B"已删除,因为B存在两个版本:主版本和祖先版本.

这两个选项看起来都不正确.我试图通过使用具有合并说明功能的塑料SCM"来解决这个问题.如图所示,祖先/库被用作"AC"版本,但是它仍然可以正确地计算出添加了多少个文件(只有1个而不是2个).

解决方案

同时总结评论并按要求解决问题...

查找合并库

  1. Git使用用于查找有向无环图的最低公共祖先的算法来计算一对提交的合并基础.只要新算法产生正确的结果,就不会在任何地方描述精确的算法,并且可能会对其进行更改.另请参阅在有向无环图中找到最低共同祖先的算法?

    可能有多个LCA.在这种情况下,-s resolve合并策略选择其中之一.您无法控制选择哪一个. -s recursive合并策略在它们上同时运行git merge,一次运行两个,就像通过以下操作一样:

    commits=$(git merge-base --all $left $right)
    if len($commits) > 1
        a=$commits[0]
        for i in range(1, len(commits))
            b=$commits[i]
            a=$(git-merge-recursively-inner $a $b)
        rof
        commits=($a)
    fi
    

    (使用伪代码).请注意,内部递归合并本身可能会找到多个合并基础.如果是这样,它将使用此算法来合并它们.

    最终结果是一次提交,$commits[0].这是合并基础.

  2. 无论如何,现在我们有了一个合并基础提交-通过仅找到一个LCA的LCA查找算法,或者通过合并递归合并来自LCA查找的多个合并基础算法,或者通过merge-resolve只是从列表中选择一个提交-我们可以查看git merge-(recursive|resolve)如何实际合并文件.它必须运行两个内部git diff操作,每个操作都已打开重命名检测器.

差异和文件标识/重命名检测

一个文件差异引擎比较两个文件.我们在左侧放一个文件,在右侧放另一个文件.在两个文件匹配的地方,差异什么也没说.如果两个文件不同,则差异引擎(取决于它的好坏)会产生一些变化,我们可以应用这些变化来使左侧文件的内容与右侧文件的内容相匹配.

要区分一对 commits ,Git将一个放在左边,一个放在右边.然后,它必须在这两次提交中将文件配对. Git可以启用或禁用重命名检测器来执行此操作.

没有重命名检测器时,图片非常清晰.当且仅当它们具有相同的 name 时,左右文件才是同一文件".在diff的左右两侧添加重命名检测器识别(标记为相同"),即使名称已更改.

Git现有的重命名检测器正在进行一些更改,以使其更好.这里不需要确切的细节:我们需要知道的是,它将说一些文件被重命名,相同"文件也被重命名,即使它们具有不同的名称.其他文件会自动具有相同的名称,因为它们具有相同的名称.

对于每个配对的文件,差异引擎都会产生一组更改,这些更改将使左侧文件变为右侧文件.重命名检测器产生需要首先执行的重命名操作.右侧 new 的文件称为添加,删除左侧提交但右侧提交中不存在的文件.

因此,提交差异对导致:

  • 要重命名的文件(从旧名称更改为新名称)
  • 要添加的文件
  • 要删除的文件

根据需要对两个提交中存在的文件进行一些更改.

进行合并,并指定合并基础

给出一个合并基础提交,解析和递归都以相同的方式进行:

  • 将合并基础与HEAD进行比较,并启用重命名检测.这些是我们的更改.
  • 在启用重命名检测的情况下,将合并基础与其他提交进行比较.这些是他们的更改.
  • 合并更改.

合并"要求在单个文件中同时解决高级更改(例如重命名,添加和删除)和低级更改.将应用合并更改的文件是合并基础中的文件.这样可以保证结果在所有情况下均有效.

例如,假设我们重命名了文件,并且他们修改了我们重命名的文件.合并后的更改实际上表示最后,将文件base.ext重命名为head.ext;同时,更改base.ext的第17行.因此,我们将更改第17行,并重命名文件,以捕获这两个动作.

高级操作可能会发生冲突!例如,如果我们重命名文件并将其删除,那就是高级冲突.如果我们和他们俩都重命名了文件,那将是一个冲突,除非我们都选择了相同的最终名称.如果我们和他们俩都删除文件,那么可以很好地结合显而易见的结果.

低级别更改也可能会发生冲突.如果我们和他们俩都以不同的方式修改了同一行,或者我们的更改及其更改在任一边缘接触",则会发生冲突.例如,如果我们替换第9行和第10行(在第8行之后删除2行,并在第8行之后插入2行),并且它们替换第11行和第12行,则我们的更改将邻接.出于一般警告,将其称为冲突.

当然,如果我们和他们将 same 更改为 same 原始行,那不是冲突. Git只是复制了这些更改的一个副本.

扩展选项-Xours-Xtheirs通过选择一侧(我们或他们的一方)而忽略另一侧来解决低级冲突.这仅适用于低级冲突.从逻辑上讲,它也可以应用于高级冲突,但事实并非如此.

将所有更改及其更改合并在一起,Git会将 combined 更改应用于 merge base 提交中找到的快照.如果没有冲突,可以自动提交生成的文件.这是这些合并的默认操作.使用--no-commit禁止执行此默认提交.

当merge-recursive使用内部合并进行合并基础提交时,即使存在合并冲突,它也会强制提交结果 .除了在您的(外部)合并也有冲突的情况下,在合并基础中显示的内容之外,您都看不到这些冲突是怎么回事. (在这种情况下,文件的基于合并的副本在索引插槽1中可用.此外,如果将merge.conflictStyle设置为diff3,则冲突文件的每个工作树副本都会显示来自合并基础的文本. ,并带有冲突标记.)

I've been doing a research of trying to understand how does the GIT merge works. I know there are several merge types as recursive, octopus, etc. I figured out that the resolve / recursive is used the most commonly. And that recursive merge is only useful when there are several common ancestors / bases.

However, I couldn't find which algorithm is used (or how the ancestor should be calculated) with repetitive merges to the master from the branch.

A simple example. Let's create an empty project with 1 file "A":

A

Then create another file "B" and commit to master

A
B

Then I create a branch from the very first version which only had 1 file "A" and create another file "C". So my branch looks like this:

A
C

Then I decide to merge my branch changes to master and I get:

A
B
C

Then I decide to go back to my branch and continue my work from there. I create another file "D"

A
C
D

Now I want to merge my changes from branch back to the trunk. How is the ancestor calculated?

A visual example:

If I take the ancestor "AC", it should say that "B" is also a new addition because it did not exist in two versions: branch and ancestor.

If I take the ancestor "ABC", it should say that "B" is deleted since B existed in two versions: master and ancestor.

Both of these options look incorrect. I tried to figure out it by using "Plastic SCM" which has a Merge explanation feature. As it shows, that the ancestor/base is being used as version "AC", however it still correctly calculated how many files were added (only 1 and not 2).

解决方案

To both summarize the comments, and address the question as asked...

Finding a merge base

  1. Git computes the merge base of a pair of commits using an algorithm for finding the Lowest Common Ancestor of a Directed Acyclic Graph. The precise algorithm is not described anywhere and may change, as long as the new one produces correct results. See also Algorithm to find lowest common ancestor in directed acyclic graph?

    There may be multiple LCAs. In this case, the -s resolve merge strategy picks one of them. You have no control over which one it picks. The -s recursive merge strategy runs git merge on them, two at a time, as if by the following:

    commits=$(git merge-base --all $left $right)
    if len($commits) > 1
        a=$commits[0]
        for i in range(1, len(commits))
            b=$commits[i]
            a=$(git-merge-recursively-inner $a $b)
        rof
        commits=($a)
    fi
    

    (in pseudo-code). Note that the inner recursive merge may itself find more than one merge base; if so, it uses this algorithm to merge them.

    The final result is a single commit, $commits[0]. This is the merge base.

  2. In any case, now that we have a single merge base commit—from the LCA-finding algorithm that only found one LCA, or by merge-recursive merging the multiple merge bases that came out of the LCA-finding algorithm, or by merge-resolve just picking one commit from the list—we can look at how git merge-(recursive|resolve) actually merges files. It must run two internal git diff operations, each with the rename detector turned on.

Diffs, and file identity / rename detection

A file difference engine compares two files. We put one file on the left and another file on the right. Where the two files match up, the diff says nothing. Where the two files differ, the difference engine—depending on how good it is—comes up with some set of changes we can apply to make the left-side's content match the right-side file's content.

To diff a pair of commits, Git puts one on the left and one on the right. Then it must pair up files in these two commits. Git can do this with a rename detector enabled, or not.

The picture is pretty clear when there is no rename detector. Files on the left and right are "the same file" if and only if they have the same name. Adding the rename-detector identifies (marks as "the same") some file(s) on the left and right sides of a diff, even if the names have changed.

Git's existing rename detector is undergoing some changes to make it better. The exact details are not required here: all we need to know is that it will say that some files are renamed, so are "the same" file, even if they have different names. Other files are automatically "the same" file because they have the same names.

For each paired-up file, the difference engine produces a set of changes that will make the left-side file become the right-side file. The rename detector produces rename operations that are required to be executed first. Files that are new in the right are called added, and files that existed in the left side commit, but do not exist in the right side commit, are deleted.

Hence, the diff-of-pair-of-commits results in:

  • files to rename (from old-name to new-name)
  • files to add
  • files to delete

plus some sets of changes for files that exist in both commits, as required.

Merging, given a merge base

Given a single merge base commit, both the resolve and recursive proceed in the same way:

  • Diff the merge base against HEAD, with rename detection enabled. These are our changes.
  • Diff the merge base against the other commit, with rename detection enabled. These are their changes.
  • Combine the changes.

"Combining" requires addressing both high-level changes, such as rename, add, and delete, and low-level changes within a single file. The file to which combined changes will be applied is the file from the merge base. That guarantees that the result works in all cases.

For instance, suppose we renamed a file and they modified the file we renamed. The combined changes say, in effect, at the end, rename file base.ext to head.ext; meanwhile, change line 17 of base.ext. So we'll change line 17, and rename the file, capturing both actions.

High level operations can conflict! For instance, if we rename a file and they delete it, that is a high level conflict. If both we and they rename a file, that is a conflict unless we both chose the same final name. If both we and they delete a file, that combines well with the obvious result.

Low level changes can also conflict. A conflict occurs if we and they both modify the same lines in different ways, or if our changes and their changes "touch" at either edge. For instance, if we replace lines 9 and 10 (delete 2 lines after line 8 and insert 2 lines after line 8) and they replace lines 11 and 12, our changes abut. Out of general caution, calls this a conflict.

Of course, if we and they make the same changes to the same original lines, that is not a conflict. Git simply takes one copy of those changes.

The extended option -Xours or -Xtheirs resolves low level conflicts by choosing one side (ours or theirs) to take, ignoring the other side. This works only for low level conflicts. Logically, it could apply to high level conflicts too, but it just doesn't.

Having combined all of our and their changes, Git will apply the combined changes to the snapshot found in the merge base commit. The resulting files can be committed automatically if there are no conflicts. This is the default action for these merges; use --no-commit to suppress this default commit.

When merge-recursive uses an inner merge to make a merge base commit, it forcibly commits the result even if there are merge conflicts. You do not get to see what it did with these conflicts, except in whatever shows up in the merge base when your (outer) merge has a conflict as well. (In this case, the merge-base copy of the file is available in index slot 1. Also, if you set merge.conflictStyle to diff3, each work-tree copy of a conflicted file will show the text from the merge base, complete with conflict markers.)

这篇关于GIT中的重复合并.它如何计算差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆