Git合并内部 [英] Git merge internals

查看:170
本文介绍了Git合并内部的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能最终会成为一个长期的问题,所以请耐心等待。

我在这里遇到了一个令人难以置信的git合并决定的解释: git如何合并工作。我正在试图建立在这个解释之上,并且看看在这种方式下描述git合并是否有任何漏洞。从本质上讲,决定一条线是否出现在合并文件中可以用一个真值表来描述:

W:原始文件
A: Alice's branch,
B:Bob's branch




基于这个真值表,可以直接想出一个基于线的算法来构造D:通过查看我的第一个问题是根据链接I(0,0,1)的情况如上所述,似乎表明,虽然这种情况实际上是冲突,但git通常会通过删除该行来处理它。这个案例实际上是否会导致冲突?我的第二个问题是关于删除案例 - (0,1,1)和(1,0,1)。直觉上,我觉得处理这些案件的方式可能会导致问题。假设W中有一个函数foo(),这个函数在任何一段代码中都没有实际调用过。让我们在分支A中说,爱丽丝最终决定删除foo()。然而,在分支B中,Bob最终决定使用foo()并编写另一个称为foo()的函数栏()。就直观而言,基于真值表,似乎合并文件将最终删除foo()函数并添加bar(),并且Bob会想知道为什么foo()不再工作!这可能会导致我认为我为3路合并派生的真值表模型可能不完整并缺少一些东西? 解决方案


我的第一个问题是(0,0,1)

一些版本控制系统就像达尔斯认为在两个分支中做同样的改变(在你的情况下,删除)并合并它们应该导致冲突。典型的例子是当你有两次

   - #define NUMBER_OF_WHATEVER 42 
+#define NUMBER_OF_WHATEVER 43

合并算法无法知道您是否希望合并产生43(因为这是两个版本达成一致的值)或者44(因为42应该增加两次)。

但是,考虑到这种情况作为冲突会导致很多虚假的冲突。例如,如果某人从主分支合并到维护分支,然后将维护分支合并到主分支中,那么由樱桃分支修改的每一行都会导致冲突。冲突标记会很奇怪,因为它们会在冲突标记的两侧显示相同的内容,如

 < <<<<<< HEAD 
Hello world
=======
Hello world
>>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086

因此,包括Git在内的大多数版本控制系统都选择考虑双方我的第二个问题是关于删除案例 - (0,1,1)和(1,0,1,1) 1)。

你所描述的是语义冲突。它们确实存在于理论上,甚至可以找到合并可编译的角落案例,但与合并的分支相比,它们具有不同的语义。没有魔法,没有文本合并算法可以检测或解决语义冲突。你必须和他们一起生活,或者单独工作。

实际上,他们很少。可能有数百万人每天都在使用版本控制系统并与之共处。大多数人可能从来没有想过问题可能存在。



不过,一个好的组织可以大大降低语义冲突的风险。如果你检查你的代码在合并后仍然编译,你可以避免约90%的语义冲突,并且如果你有一个自动测试套件,那么你必须找到一个语义冲突来创建一个没有被你的测试套件覆盖的错误,有问题的。

实际上,语义冲突并不是特定于版本控制系统的。另一种不使用合并的方案是


  • 我读取代码并看到一个函数 f() f()

  • 使用最新版本,它没有 f()了,我还记得有一个函数 f(),我尝试使用它。



总之,不要害怕语义冲突。

This is probably going to end up being a long question, so please bear with me.

I came across an incredible explanation for git merge decisions here: How does git merge work. I am trying to build on this explanation and see if there are any holes in depicting git merge this way. Essentially, the decision of whether or not a line shows up in the merged file can be depicted by a truth table:

W: original file, A: Alice's branch, B: Bob's branch

Based on this truth table, it is straightforward to think up a line based algorithm to construct D: Construct D line-by-line by looking at corresponding lines from A and B and making a decision based on truth-table.

My first question is the case (0, 0, 1) which according to the link I posted above, seems to suggest that while that case is actually a conflict, git usually handles it by deleting the line anyway. Can this case actually ever lead to a conflict?

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1). Intuitively, I feel the way these cases are handled might lead to a problem. Let’s say there a was function foo() in W. This function was never actually called in any piece of code. Let’s say in branch A, Alice finally decided to remove foo(). However, in branch B, Bob finally decided on a use for foo() and wrote another function bar() that called foo(). Just intuitively, based on the truth-table, it seems like the merged file will end up deleting the foo() function and adding bar() and Bob would be left wondering why foo() doesn’t work anymore! Which probably leads me to think that the truth-table model I derived for the 3 way merge, is probably not complete and missing something?

解决方案

My first question is the case (0, 0, 1)

Some version control systems like darcs consider that doing the same change (in your case, deletion) in two branches and merging them should lead to a conflict. The typical example is when you have twice

-#define NUMBER_OF_WHATEVER 42
+#define NUMBER_OF_WHATEVER 43

The merge algorithm cannot know for you whether you want the merge to yield 43 (because this is the value both versions agree on) or 44 (because 42 should be incremented twice).

However, considering this case as a conflict causes a lot of spurious conflicts. For example, if one cherry-picks a merge from the master branch to a maintainance branch and later merges the maintainance branch into master, then each line modified by the cherry-pick would lead to a conflict. And the conflict markers would be weird, because they would show the same content on both sides of the conflict marker, like

<<<<<<< HEAD
Hello world
=======
Hello world
>>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086

So, most version-control systems, including Git, chose to consider no conflict when both sides of the merge introduce the same change.

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1).

What you are describing is semantic conflicts. They do exist in theory, and you can even find corner-cases where the merge is compilable but has different semantics compared to the branches being merged. There's no magic, no textual merge algorithm can detect or resolve semantic conflicts. You have to live with them, or work alone.

In practice, they are rare enough. There are probably millions of people using a version control system daily and living with it. The majority probably never thought the problem could exist.

Still, a good organization considerably reduces the risk of semantic conflicts. If you check that your code still compiles after merges, you avoid ~90% of semantic conflicts, and if you have an automatic testsuite, then you'd have to find a semantic conflicts that creates a bug not covered by your testsuite for it to be problematic.

And actually, semantic conflicts are not specific to version-control systems. Another scenario not using merge is

  • I read the code and see a function f()
  • My coworker removes function f()
  • Working on the latest version, which doesn't have f() anymore, I still remember that there's a function f() and I try to use it.

In short, don't be afraid of semantic conflicts.

这篇关于Git合并内部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆