如何在合并时使 --squash 成为默认值? [英] How to make --squash default on a merge?

查看:38
本文介绍了如何在合并时使 --squash 成为默认值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用单独的分支来修复重要的错误和功能.通过频繁执行 git checkout <x>,分支与 master 保持同步;git合并主.

我注意到在合并时,git 会用多个不相关的消息污染日志文件.例如,git 将添加所有提交消息,而不是单个Merge into Master"或Merge Master into ".Master 上的治理(流程和程序)存在问题,因为在开发过程中分支中可能存在的错误 not 并且 not 永远存在于 Master 分支中.

更糟糕的是,分支和主节点之间的行为不同.将master合并到分支时,会生成类似于Merge Master into "的日志条目.但是,在将分支合并到 Master 时,没有Merge into Master".根据日志,就好像开发分支从来没有存在过,也没有发生过合并.

我了解到我必须做一些特别的事情才能让 git 表现得像预期的那样;即 如何使用 git merge --squash?(它的经典 git modus operation:采取一些简单的和让它变得困难).

我的问题是,如何使 --squash 成为合并期间的默认操作?

解决方案

为了让 Git 始终执行压缩合并";对于 master 分支:

$ git config branch.master.mergeOptions "--squash";

说明

你不能让 Git 进行压缩合并";默认情况下所有分支,但你可以让它做一个壁球合并";某些 分支的默认设置.由于您对仅对 master 进行此操作特别感兴趣,这可能正是您想要的.

让我们快速1回顾一下 git merge 真正做了什么,因为在通常的 Git 方式中,Git 使一切变得复杂.而且,这个:

<块引用>

我们使用单独的分支来修复重要的错误和功能.通过执行频繁的 git checkout <x>,分支与 master 保持同步;git合并主.

与许多人认为的正确"相反Git 中的工作流程.我对是否可以将任何 Git 工作流程称为正确"有一些疑问.:-) ,但有些比其他的更成功,这绝对是其中一个更成功的相反.(我确实认为它可以很好地工作,如下面的扩展讨论中所述.)


1好吧,我尝试保持简短.:-) 随意浏览,尽管这里有很多重要的材料.如果 TL;DR,直接跳到最后.

提交图

如您所知,但其他人可能不会,在 Git 中,提交图控制了很多.每个1 提交都有一些父提交,或者在合并提交的情况下,有两个或多个父提交.为了进行新的提交,我们进入某个分支:

$ git checkout funkybranch

并在工作树中做一些工作,git add一些文件,最后git commit将结果转移到分支funkybranch:

... 工作工作工作...$ git commit -m '做一件事'

current 提交是名称 funkybranch 指向的(一个,单个)提交.Git 通过读取 HEAD 发现这一点:HEAD 通常包含分支的名称,而分支包含提交的原始 SHA-1 哈希 ID.

为了进行新的提交,Git 从我们所在的分支读取当前提交的 ID,将索引/暂存区保存到存储库中,2 将新提交写入当前提交的 ID 作为新提交的父级,并且——最后——将 提交的 ID 写入分支信息文件.

这是一个分支的增长方式:从一次提交,我们创建一个新的,然后将分支名称移动到指向新的提交.当我们把它作为一个线性链来做时,我们得到了一个很好的线性历史:

... <- C <- D <- E <-- funkybranch

Commit E(实际上可能是 e35d9f... 或其他)是当前提交.它指向 D 因为当我们制作 E 时,D was 是当前提交;D 指向 C 因为 C 在那个点是最新的;等等.

当我们使用例如 git checkout -b 创建新分支时,我们所做的就是告诉 Git 创建一个新的名称,指向一些现有的提交.通常这只是当前的提交.所以如果我们在 funkybranch 并且 funkybranch 指向提交 E 并且我们运行:

git checkout newbranch

然后我们得到这个:

... <- C <- D <- E <-- funkybranch, newbranch

也就是说,两个名字都指向提交E.Git 知道我们现在在 newbranch 上,因为 HEADnewbranch.我也喜欢将其包含在这种绘图中:

... <- C <- D <- E <-- funkybranch, HEAD ->新分支

我还喜欢以更紧凑的方式绘制图形.我们知道提交总是指向时间倒退".给他们的父母,因为在我们提交 D 之前不可能进行新的提交 E.所以这些箭头总是指向左边,我们可以画一两条破折号:

...--C--D--E <-- funkybranch, HEAD ->新分支

(然后如果我们不需要知道哪个提交是哪个,我们可以为每个提交一个圆形的 o 节点,但现在我将在这里坚持使用单个大写字母).

如果我们现在进行新的提交——commit F——它会导致 newbranch 前进(因为,正如我们从 HEAD 看到的,我们在 newbranch 上).所以让我们画出来:

...--C--D--E <-- funkybranchF<--头->新分支

现在让我们再次git checkout funkybranch,在那里做一些工作并提交它,进行新的提交G:

...--C--D--E--G <-- HEAD ->时髦的分支F <-- 新分支

(并且 HEAD 现在指向 funkybranch).现在我们有了可以合并的东西.


1好吧,除了根提交之外的所有提交.在大多数 Git 存储库中,只有一个根提交,这是第一个提交.显然它不能有父提交,因为每个新提交的父提交是我们进行新提交时当前提交的任何提交.根本没有提交,当我们进行第一次提交时还没有当前提交.所以它变成了一个根提交,然后所有以后的提交都是它的孩子、孙子、曾孙子等等.

2大部分保存"工作实际上发生在每个 git add 上.索引/暂存区包含哈希 ID,而不是实际文件内容:当您运行 git add 时,文件内容已保存.这是因为 Git 的图不仅仅是提交对象,而是存储库中的每个对象.与例如 Mercurial(它在提交时保存文件而不是添加时间)相比,这是使 Git 如此快速的部分原因.幸运的是,与提交图本身不同,这是用户不需要知道或关心的东西.

Git 合并

和以前一样,我们必须在某个分支上.1我们在 funkybranch 上,所以我们都很高兴:

$ git merge newbranch

在这一点上,大多数人似乎认为万智牌发生了.但这根本不是魔法.Git 现在找到我们当前提交和我们命名的提交之间的 merge base,然后运行两个 git diff 命令.

合并基础只是2第一个共同"提交在两个分支上 - 两个 分支上的第一个提交.我们在 funkybranch 上,它指向 G.我们给 Git 分支 name newbranch,它指向提交 F.因此,我们正在合并提交 GF,并且 Git 遵循它们的两个父指针,直到它到达位于两个分支上的提交节点.在这种情况下,就是 commit E:commit E 是合并基础.

现在 Git 运行这两个 git diff 命令.将合并基础与当前提交进行比较: git diff <id-of-G>.第二个差异将合并基础与另一个提交进行比较: git diff <id-of-F>.

最后,Git 尝试合并两组更改,将结果写入我们当前的工作树.如果更改看起来是独立的,Git 会同时使用它们.如果它们似乎发生冲突,Git 会因合并冲突"而停止;并让我们清理它.如果它们看起来是相同更改,Git 只需要更改的一个副本.

所有这些似乎"东西是在纯文本的基础上完成的.Git 对代码一窍不通.它只是看到诸如删除一行阅读 ++x;"之类的东西.并添加一行读取 y *= 2;.那些看起来不同,所以只要它们看起来在不同的区域,它就会对合并库中的文件进行一次删除和一次添加,将结果放入工作树中.

最后,假设一切顺利并且合并不会因冲突而停止,Git 会进行新的提交.新提交是一个合并提交,这意味着它有两个3父级.第一个父级——顺序很重要——是当前提交,就像常规的非合并提交一样.second 父项是另一个提交.一旦提交安全地写入存储库,Git 会像往常一样将新提交的 ID 写入分支名称.所以,假设合并有效,我们得到:

...--C--D--E--G--H <-- HEAD ->时髦的分支/F <-- 新分支

注意 newbranch 没有移动:它仍然指向 commit F.HEAD 也没有改变:它仍然包含名称 funkybranch.只有 funkybranch 发生了变化:它现在指向新的合并提交 H,而 H 指向回 G,以及F.


1Git 在这方面有点精神分裂.如果我们git checkout一个原始的SHA-1,或者任何其他不是分支名称的东西,它就会进入一个叫做分离的HEAD"的状态.在内部,这是通过将 SHA-1 哈希直接推送到 HEAD 文件中来实现的,这样 HEAD 就会给出提交 ID,而不是分支的名称.但是 Git 做其他所有事情的方式使它像我们在一个特殊的分支上一样工作,它的名字只是一个空字符串.它是(单个)匿名分支——或者等效地,它是名为 HEAD 的分支.所以在某种意义上,我们总是在一个分支上:即使 Git 说我们不在任何分支上,Git 也说我们在特殊的匿名分支上.

这会引起很多混乱,如果不允许它可能更明智,但是Git在git rebase期间在内部使用它,因此它实际上非常重要.如果 rebase 出错,这个细节就会泄露,你最终不得不知道什么是分离的 HEAD".意味着,并且是.

2我在这里故意忽略了一个困难的情况,当有多个可能的合并基础提交时会发生这种情况.Mercurial 和 Git 在这里使用不同的解决方案:Mercurial 随机选择一个(似乎是),而 Git 为您提供选项.不过,这些情况很少见,理想情况下,即使确实发生了,Mercurial 更简单的方法仍然有效.

3实际上是两个或更多:Git 支持章鱼合并的概念.但是没有必要去那里.:-)

合并将图从树更改为 DAG

合并——真正的合并:与两个或更多父项的提交——有很多重要的——关键的,甚至——副作用.主要的原因是合并的存在导致提交图数据结构从 改变为 DAG:有向无环图./p>

当 Git 遍历图形时,就像它对许多操作所做的那样,它通常会沿着 所有 路径返回.由于合并有两个父项,git log 遍历图形,显示两个父项提交.因此,这被视为一项功能:

<块引用>

例如,而不是单个合并到主"或者Merge Master into",git会添加所有的提交信息.

Git 遵循并记录原始提交序列——提交 HGED 等等——合并的提交序列FED和很快.当然,它只显示每个提交一次;默认情况下,它会根据日期标记对这些提交进行排序,如果每个分支都有许多日期重叠的提交,则将这两个分支混合在一起.

如果您不想看到通过另一面"进来的提交对于合并,Git 有办法做到这一点:--first-parent 告诉每个遍历图的 Git 命令1 只跟随 first 每次合并的父级.另一边仍然存在于图中,它仍然影响 Git 如何计算诸如合并基数之类的东西,但是 git log --first-parent 不会显示.


1这是相当多的Git命令.他们使用,或者在 git log 本身的情况下,aregit rev-list 的变体,这是 Git 的通用图形行走程序.这段代码是 push、fetch、bisect、log、blame、rebase 和许多其他代码的核心.它的文档有一系列令人眼花缭乱的选项.作为临时用户需要知道的关键是 --first-parent(这里刚刚讨论过);--no-walk(完全抑制图形行走);--ancestry-path(简化源树相关工作的历史);--simplify-by-decoration(简化 git log 输出的历史记录);--branches--remotes--tags(按分支、远程或标签名称选择图行走的起点);--merges--no-merges(包括或排除合并提交);--since--until (按日期范围限制提交);以及基本的 .....(两点和三点)图形子集操作.

合并的好处

合并就位意味着分支上的开发可以在该分支上继续,并且稍后的 git merge 会找到更新的——因此不那么复杂的——合并基础.考虑这个图表,其中只有少数提交具有单字母名称:

 o--o--o--o--H--o--o--I <-- 特征2/ A--o--B---C-----D--E-----F--G <-- master///o--o--J--o--o--K--o--o--L <-- 特征1

这里,除了在根提交 A 之后在 master 上完成的两次早期提交之外,所有的开发都发生在侧分支 feature1 和<代码>功能2.提交 CDEFG 都是合并(在这种情况下,严格进入 master),当它准备好时将特性工作引入 master.

注意,当我们在 master 上提交 C 时,我们做了:

$ git checkout master;git合并功能1

发现A作为合并基础,BJ作为两个tip提交合并.当我们制作 D 时:

$ git checkout master;git合并功能2

我们有 A 作为合并基础和 CH 作为两个提示提交.到目前为止,这没什么特别的.但是当我们制作 E 时,到目前为止我们有这么多(最终的 os,甚至 I,在 feature2代码>可能已经或可能没有到位——它们没有效果):

 o--o--o--o--H--o--o <-- 特征2/A--o--B---C-----D <-- 主/o--o--J--o--o--K <-- 特征1

masterfeature1 的合并基础是 两个 分支上的第一个提交,即提交 J,这是我们合并成C的那个.因此,为了进行这种合并,Git 比较了 JD(我们从 feature2 中引入的代码)和 Jvs K:feature1 上的代码(并且只有新代码).如果一切顺利,或者一旦我们修复了合并冲突,就会提交 E,我们现在有:

 o--o--o--o--H--o--o--I <-- 特征2/A--o--B---C-----D--E <-- 主//o--o--J--o--o--K--o--o <-- 特征1

当我们再次合并 feature2 时.这次合并基础是commit H:从feature2 直接向后移动很快就会碰到H,然后从E 移动到 D 然后从 masterH 也点击 H.所以现在Git比较了H vs E,这是我们从feature1中引入的,以及H vs I,这是我们添加到 feature2 中的东西,并合并了那些.

合并的缺点

树有一些非常好的图论特性,例如保证单个简单的合并基础.任意 DAG 可能会失去这些属性.特别是,doing 是两种方式的合并——将 master 合并到 branch branch 合并到 master——导致十字交叉合并";这可以为您提供多个合并基础.

合并也使图表 (git log) 很难理解.使用 --first-parent--simplify-by-decoration 会有所帮助,尤其是当您练习良好的合并时,但这些图自然会变得混乱.

壁球合并

Squash 合并避免了这些问题,但要付出相当大的代价:它们根本不是合并.(很快,我们就会看到如何处理这个问题.)

当您运行 git merge --squash 时,Git 在寻找合并基础方面会经历与以前相同的动作,并进行两个差异:merge-base vs current-commit 和 merge-base 与其他提交.然后它以与常规提交完全相同的方式组合更改.但随后它进行了一个普通提交.1新提交只有一个父级,取自当前分支.

让我们看看 feature1feature2 的相同序列的作用:

 o--o--o <-- 特征2/A--o--B <-- 主o--o--J <-- 特征 1

我们做git checkout master;git merge --squash feature1 进行新的提交 C.Git 比较 AB 以查看我们在 master 上做了什么,以及 AJ 看看他们(我们)在 feature1 上做了什么.Git 结合了这些更改,我们得到了 C 提交,但只有一个父级:

 o--o--o <-- 特征2/A--o--B---C <-- 主o--o--J <-- 特征 1

现在我们将 D 作为 feature2 的一个壁球:

 o--o--o--o--H <-- 特征2/A--o--B---C <-- 主o--o--J--o--o <-- 特征1

Git 比较 AC,以及 AH,与上次相同.我们现在得到 D.到目前为止,它几乎相同,只是没有分支重新加入的点.但是现在是时候制作E:

 o--o--o--o--H--o--o <-- 特征2/A--o--B---C-----D <-- 主o--o--J--o--o--K <-- 特征1

我们运行git checkout master;git merge --squash feature1 和以前一样.

上次,Git 比较了 J-vs-DJ-vs-K,作为提交J 是我们的合并基础.

这一次,提交 A 是(仍然)我们的合并基础.Git 比较 AD,以及 AK.如果上次在 C 中解决了冲突,我们可能需要再次解决它们.这很糟糕——但我们还没有迷失.


1普通,而不是合并.因此,壁球合并根本不是合并:它是让我完成工作"提交,但它不是 merge 提交.我们还需要一个真正的合并提交;我们将在下一节讨论这一点.

Git 实际上停在这里并强制您运行 git commit 以进行壁球提交.为什么?谁知道呢,这是 Git.:-)

Squash 合并可以工作

为了解决上述问题,我们只需要从 masterre-merge(使用 non-squash real merge")回到 feature 分支.也就是说,不是简单地从任何一个特性分支合并到 master,然后继续在特性分支上工作,我们这样做:

 o--o--o--o--H--*-o--o <-- 特征2//A--o--B---C----D <-- 主 o--o--J---*--o--o--K <-- 特征1

这些标记为 * 的新提交是(非压缩)合并 from master,into feature1feature2.我们进行了壁球合并 C 以获取从 AJ 所做的更改.所以我们然后对 feature1 进行真正的合并,最好使用直接来自 master1 的树(它具有 o--B-- 也是如此).(我们也在feature2上做了*,作为一般的准备,在master上做了D后引入从 AH 的所有内容.就像 feature1 上的 * 一样,我们可能只想要直接来自 master.)

既然我们已准备好从 feature1 引入更多工作,我们可以再进行一次(挤压)合并.masterfeature1的merge-base是commit C,两个tip是DK,这正是我们想要的.Git 的合并代码会得出一个相当接近的结果;我们修复任何冲突、测试、修复任何损坏并提交;然后我们再做一次准备工作"像以前一样合并 from master into feature1.

这个工作流程比合并到母版"要复杂一些.一个,但应该给出好的结果.


1Git 并没有让这一切变得微不足道:我们想要与 -s theirs 策略合并,而 Git 根本没有. 有一种简单的方法可以使用管道"来获得所需的效果.命令,但我将把它排除在这个已经很长的答案之外.

那么,如果一切正常,那么机制如何?

注意,我们想要的是 merge --squash 在合并 master 时,但在合并 from 时是常规(非挤压)合并掌握.换句话说:

$ git checkout master &&git 合并 foo

应该使用 --squash,但是:

$ git checkout foo &&git合并大师

应该使用--squash.(从上一节中的脚注复制树可能不错,但应该是不必要的:合并结果基本上应该始终是直接从 master 中提取出来的树.)

git merge 运行时,它会查看当前分支(它总是必须的).如果那个分支有一个名字——如果我们不在分离的 HEAD"中模式——然后 Git 会查看您的配置,查找存储在 branch.branch.mergeOptions 下的值.此处的任何字符串值都会被扫描,就好像它是 git merge 命令的一部分一样.

因此:

$ git config branch.master.mergeOptions "--squash";

(技术上不需要引号,可以在git config之后,branch.master.mergeOptions之前添加--global)设置您当前的存储库以将压缩合并到 master.(使用 --global,它将它设置为所有存储库的个人默认值.但是,在特定存储库中设置的任何 branch.master.mergeOptions 都将覆盖这些全局设置.)

We use separate branches for non-trivial bug fixes and features. The branchhes are kept in-sync with master by performing frequent git checkout <x>; git merge master.

I noticed when merging, git pollutes the log files with multiple, non-relevant messages. For example, rather than a single "Merge <X> into Master" or "Merge Master into <X>", git will add all the commit messages. Its a problem with governance (processes sand procedures) on Master because the bugs that may have been present in a branch during development are not and were not ever present in the Master branch.

Worse, the behaviors are different between branches and master. When merging master into branches, there is a log entry generated similar to "Merge Master into <X>". However, when merging a branch into Master, there is no "Merge <X> into Master". According to the logs, its as if the development branch never existed and the merge never occurred.

I learned I had to do something special to make git behave as expected; namely How to use git merge --squash? (Its classic git modus operandi: take something simple and make it difficult).

My question is, how do I make --squash the default action during a merge?

解决方案

To make Git always do a squash "merge" for the master branch:

$ git config branch.master.mergeOptions "--squash"

Explanation

You can't make Git do a squash "merge" by default for all branches, but you can make it do a squash "merge" by default for some branches. Since you are particularly interested in making this happen only for master, that may be just what you want.

Let's do a quick1 review of what git merge really does since, in the usual Git fashion, Git complicates everything. And, this:

We use separate branches for non-trivial bug fixes and features. The branches are kept in-sync with master by performing frequent git checkout <x>; git merge master.

is reversed from what many people believe to be the "correct" work-flow in Git. I have some doubts as to whether any Git work-flow can be called "correct" :-) , but some are more successful than others, and this is definitely the reverse of one of the more successful ones. (I do think it can work well, as noted in the extended discussion below.)


1Well, I tried to keep it short. :-) Feel free to skim, although there's a bunch of important material here. If TL;DR, just jump straight to the end.

The commit graph

As you know, but others may not, in Git, much is controlled by the commit graph. Every1 commit has some parent commit, or in the case of a merge commit, two or more parents. To make a new commit, we get on some branch:

$ git checkout funkybranch

and do some work in the work-tree, git add some files, and finally git commit the result to branch funkybranch:

... work work work ...
$ git commit -m 'do a thing'

The current commit is the (one, single) commit to which the name funkybranch points. Git finds this by reading HEAD: HEAD normally contains the name of the branch, and the branch contains the raw SHA-1 hash ID of the commit.

To make the new commit, Git reads the ID of the current commit from the branch we're on, saves the index/staging-area into the repository,2 writes the new commit with the current commit's ID as the new commit's parent, and—last—writes the new commit's ID to the branch information file.

This is how a branch grows: from one commit, we make a new one, and then move the branch name to point to the new commit. When we do this as a linear chain, we get a nice linear history:

... <- C <- D <- E   <-- funkybranch

Commit E (which might actually be e35d9f... or whatever) is the current commit. It points back to D because D was the current commit when we made E; D points back to C because C was current at that point; and so on.

When we make new branches with, e.g., git checkout -b, all we are doing is telling Git to make a new name, pointing to some existing commit. Usually this is just the current commit. So if we are on funkybranch and funkybranch points to commit E and we run:

git checkout newbranch

then we get this:

... <- C <- D <- E   <-- funkybranch, newbranch

That is, both names point to commit E. Git knows that we're on newbranch now because HEAD says newbranch. I like to include that in this kind of drawing too:

... <- C <- D <- E   <-- funkybranch, HEAD -> newbranch

I also like to draw my graphs in a bit more compact fashion. We know that commits always point "backwards in time" to their parents, because it's impossible to make new commit E before we've made commit D. So these arrows always point leftward and we can just draw one or two dashes:

...--C--D--E   <-- funkybranch, HEAD -> newbranch

(and then if we don't need to know which commit is which, we can just draw a round o node for each one, but for now I will stick to single uppercase letters here).

If we make a new commit now—commit F—it causes newbranch to advance (because, as we can see from HEAD, we're on newbranch). So let's draw that:

...--C--D--E      <-- funkybranch
            
             F    <-- HEAD -> newbranch

Now let's git checkout funkybranch again, and do some work there and commit it, making new commit G:

...--C--D--E--G   <-- HEAD -> funkybranch
            
             F    <-- newbranch

(and HEAD is now pointing to funkybranch). Now we have something we can merge.


1Well, every commit except for root commits. In most Git repositories there is just one root commit, which is the very first commit. Obviously it cannot have a parent commit, since the parent of each new commit is whatever commit was current when we made the new commit. With no commits at all, there is no current commit yet when we make the first commit. So it becomes a root commit, and then all later commits are its children, grandchildren, great-grand-children, and so on.

2Most of the "save" work actually happens at each git add. The index/staging-area contains hash IDs, rather than actual file contents: the file contents were saved away when you ran git add. This is because Git's graph is not just of commit objects, but of every object in the repository. This is part of what makes Git so fast as compared to, e.g., Mercurial (which saves the files away at commit time rather than add time). Fortunately this, unlike the commit graph itself, is something users need not know or care about.

Git merge

As before, we have to be on some branch.1 We're on funkybranch, so we are all good to go:

$ git merge newbranch

At this point, most people seem to think that Magic Happens. It's not magic at all though. Git now finds the merge base between our current commit and the one we named, and then runs two git diff commands.

The merge base is simply2 the first commit "in common" on the two branches—the first commit that is on both branches. We are on funkybranch, which points to G. We gave Git the branch name newbranch, which points to commit F. So we're merging commits G and F, and Git follows both of their parent pointers until it reaches a commit node that is on both branches. In this case, that's commit E: commit E is the merge base.

Now Git runs those two git diff commands. One compares the merge base against the current commit: git diff <id-of-E> <id-of-G>. The second diff compares the merge base against the other commit: git diff <id-of-E> <id-of-F>.

Finally, Git attempts to combine the two sets of changes, writing the result to our current work-tree. If the changes seem independent, Git takes both of them. If they seem to collide, Git stops with a "merge conflict" and makes us clean it up. If they seem to be the same changes, Git takes just one copy of the changes.

All of this "seems" stuff is done on a purely textual basis. Git has no understanding of code. It just sees things like "delete a line reading ++x;" and "add a line reading y *= 2;. Those look different, so as long as they seem to be in different areas, it does the one delete and the one add, to the files in the merge-base, putting the result in the work-tree.

Last, assuming all goes well and the merge does not stop with a conflict, Git makes a new commit. The new commit is a merge commit, which means it has two3 parents. The first parent—the order matters—is the current commit, just as with regular, non-merge commits. The second parent is the other commit. Once the commit is safely written to the repository, Git writes the new commit's ID into the branch name, as usual. So, assuming the merge works, we get this:

...--C--D--E--G--H  <-- HEAD -> funkybranch
               /
              F     <-- newbranch

Note that newbranch has not moved: it still points to commit F. HEAD has not changed either: it still contains the name funkybranch. Only funkybranch has changed: it now points to the new merge commit H, and H points back to G, and also to F.


1Git is a bit schizoid about this. If we git checkout a raw SHA-1, or anything else that is not a branch name, it goes into a state it calls "detached HEAD". Internally, this works by shoving the SHA-1 hash directly into the HEAD file, so that HEAD gives the commit ID, rather than the name of the branch. But the way Git does everything else makes it work as though we're on a special branch whose name is just the empty string. It's the (single) anonymous branch—or, equivalently, it's the branch named HEAD. So in one sense, we're always on a branch: even if Git says that we're not on any branch, Git also says that we're on the special anonymous branch.

This causes a lot of confusion, and it might be more sensible if it weren't allowed, but Git uses it internally during git rebase, so it's actually pretty important. If the rebase goes wrong, this detail leaks out, and you wind up having to know what "detached HEAD" means, and is.

2I am deliberately ignoring a hard case here, which occurs when there are multiple possible merge base commits. Mercurial and Git use different solutions here: Mercurial picks one at (what seems to be) random, while Git gives you options. These cases are rare though, and ideally, even when they do occur, Mercurial's simpler method works anyway.

3Two or more, really: Git supports the concept of an octopus merge. But there's no need to go there. :-)

Merge changes the graph from a tree to a DAG

Merges—true merges: commits with two or more parents—have a bunch of important—critical, even—side effects. The main one is that the presence of a merge causes the commit graph data structure to change from a tree, where branches simply fork off and grow on their own, into a DAG: a Directed Acyclic Graph.

When Git walks the graph, as it does for so many operations, it usually follows all paths back. Since a merge has two parents, git log, which walks the graph, shows both parent commits. Hence this is considered a Feature:

For example, rather than a single "Merge into Master" or "Merge Master into ", git will add all the commit messages.

Git is following, and hence logging, both the original commit sequence—commits H, G, E, D, and so on—and the merged-in commit sequence F, E, D, and so on. Of course, it shows each commit only once; and by default, it sorts these commits by their date-stamps, intermingling the two branches if each one has many commits with dates that overlap.

If you don't want to see the commits that came in via the "other side" of a merge, Git has a way to do that: --first-parent tells every Git command that walks the graph1 to follow only the first parent of each merge. The other side is still there in the graph, and it still affects how Git computes things like the merge base, but git log --first-parent won't show it.


1This is quite a lot of Git commands. They use, or in the case of git log itself, are, variants of git rev-list, which is Git's general purpose graph-walk program. This code is central to push, fetch, bisect, log, blame, rebase, and numerous others. Its documentation has a dizzying array of options. The key ones to know as a casual user are --first-parent (just discussed here); --no-walk (suppresses graph walking entirely); --ancestry-path (simplifies history for source tree related work); --simplify-by-decoration (simplifies history for git log output); --branches, --remotes, and --tags (selects starting points for graph walking by branch, remote, or tag name); --merges and --no-merges (include or exclude merge commits); --since and --until (limit commits by date ranges); and the basic .. and ... (two and three dot) graph subsetting operations.

Benefits of merges

Having the merge in place means that development on a branch can continue on that branch, and a later git merge finds a newer—and hence less complicated—merge base. Consider this graph, where only a few commits have single-letter names:

  o--o--o--o--H--o--o--I        <-- feature2
 /                     
A--o--B---C-----D--E-----F--G   <-- master
        /        /        /
  o--o--J--o--o--K--o--o--L     <-- feature1

Here, except for two early commits done on master after the root commit A, all development has taken place on side branches feature1 and feature2. Commits C, D, E, F, and G are all merges (in this case, strictly into master), bringing the feature-work into master when it was ready.

Note that when we made commit C on master, we did:

$ git checkout master; git merge feature1

which found A as the merge base and B and J as the two tip commits to merge. When we made D:

$ git checkout master; git merge feature2

we had A as the merge base and C and H as the two tip commits. So far, this is nothing special. But when we made E, we had this much so far (the final os, and even I, on feature2 may or may not have been in place—they have no effect):

  o--o--o--o--H--o--o           <-- feature2
 /             
A--o--B---C-----D               <-- master
        /
  o--o--J--o--o--K              <-- feature1

The merge base of master and feature1 is the first commit that is on both branches, which is commit J, which is the one we merged in to make C. So to do this merge, Git compares J vs D—the code we brought in from feature2—and J vs K: the new code (and only the new code) on feature1. If all goes well, or once we fix merge conflicts, this makes commit E and we now have:

  o--o--o--o--H--o--o--I        <-- feature2
 /             
A--o--B---C-----D--E            <-- master
        /        /
  o--o--J--o--o--K--o--o        <-- feature1

when we go to merge feature2 again. This time the merge base is commit H: moving straight back from feature2 soon hits H, and moving from E to D and then up to H from master also hits H. So now Git compares H vs E, which is what we brought in from feature1, and H vs I, which is the new stuff we added to feature2, and merges just those.

Drawbacks of merges

Trees have some very nice graph-theoretic properties, such as a guarantee of a single simple merge-base. Arbitrary DAGs may lose these properties. In particular, doing merges both ways—merging master into branch and merging branch into master—results in "criss cross merges" that can give you multiple merge bases.

Merges also make the graph (git log) very hard to follow. Using --first-parent or --simplify-by-decoration helps, especially if you practice good merging, but these graphs just naturally get messy.

Squash merges

Squash merges avoid the problems, but do so by paying a fairly heavy price: they are not merges at all. (Soon, we'll see how to deal with this.)

When you run git merge --squash, Git goes through the same motions as before in terms of finding a merge base, and making two diffs: merge-base vs current-commit, and merge-base vs other-commit. It then combines the changes in exactly the same way as for a regular commit. But then it makes an ordinary commit.1 The new commit has just a single parent, taken from the current branch.

Let's see that in action for the same sequence with feature1 and feature2:

  o--o--o                       <-- feature2
 /
A--o--B                         <-- master
  
  o--o--J                       <-- feature1

We do git checkout master; git merge --squash feature1 to make new commit C. Git compares A vs B to see what we did on master, and A vs J to see what they (we) did on feature1. Git combines those changes and we get commit C, but with only one parent:

  o--o--o                       <-- feature2
 /
A--o--B---C                     <-- master
 
  o--o--J                       <-- feature1

Now we'll make D as a squash from feature2:

  o--o--o--o--H                 <-- feature2
 /
A--o--B---C                     <-- master
 
  o--o--J--o--o                 <-- feature1

Git compares A vs C, and A vs H, same as last time. We now get D. So far it's much the same, except that there are no points where the branches rejoin. But now it is time to make E:

  o--o--o--o--H--o--o           <-- feature2
 /
A--o--B---C-----D               <-- master
 
  o--o--J--o--o--K              <-- feature1

We run git checkout master; git merge --squash feature1 as before.

Last time, Git compared J-vs-D and J-vs-K, as commit J was our merge base.

This time, commit A is (still) our merge base. Git compares A vs D, and A vs K. If there were conflicts we solved at C last time, we probably have to solve them again. This is bad—but we're not lost yet.


1Ordinary, as opposed to merge. As such, a squash merge is not a merge at all: it's a "get me the work done" commit, but it's not a merge commit. We need a real merge commit in addition; we will get to this in the next section.

Git actually stops here and forces you to run git commit to make the squash commit. Why? Who knows, it's Git. :-)

Squash merges can work

To solve the above, we just need to re-merge (with a non-squash "real merge") from master back to the feature branches. That is, instead of simply merging from whichever feature branch into master, and then continuing to work on the feature branch, we do this:

  o--o--o--o--H--*-o--o        <-- feature2
 /              /
A--o--B---C----D               <-- master
          
  o--o--J---*--o--o--K         <-- feature1

These new commits, marked *, are (non-squash) merges from master, into feature1 and feature2. We made squash merge C to pick up changes made from A to J. So we then make a real merge into feature1, preferably using the tree straight from master1 (which has whatever goodies were in o--B-- as well). (We also made the * on feature2, just as general preparation, after making D on master to bring in everything from A to H. Like the * on feature1 we probably just want the source tree straight from master.)

Now that we're ready to bring in more work from feature1, we can just do another (squash) merge. The merge-base of master and feature1 is commit C, and the two tips are D and K, which is just what we want. Git's merge code will come up with a reasonably close result; we fix up any conflicts, test, fix any breakage, and commit; and then we do another "prep work" merge from master back into feature1 as before.

This work-flow is a bit more complicated than the "merge into master" one, but should give good results.


1Git does not make this totally trivial: we want a merge with a -s theirs strategy, which Git simply doesn't have. There is an easy way to get the desired effect using "plumbing" commands, but I'll leave that out of this answer, which is already crazy-long.

So, if that all works, how about the mechanics?

Note that what we want is merge --squash when merging into master, but regular (non-squash) merge when merging from master. In other words:

$ git checkout master && git merge foo

should use --squash, but:

$ git checkout foo && git merge master

should not use --squash. (The tree copying from the footnote in the previous section might be nice, but should be unnecessary: the merge result should basically always be the tree straight out of master.)

When git merge runs, it looks at the current branch (as it always must). If that branch has a name—if we're not in "detached HEAD" mode—Git then looks at your configuration, for a value stored under branch.branch.mergeOptions. Any string value here is scanned as if it were part of the git merge command.

Hence:

$ git config branch.master.mergeOptions "--squash"

(the quotes are not technically required, and you can add --global after git config, before branch.master.mergeOptions) sets up your current repository to do squash-merges into master. (With --global, it sets this as your personal default for all repositories. Any branch.master.mergeOptions set in a particular repository will override these global ones, though.)

这篇关于如何在合并时使 --squash 成为默认值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆