git-log缺少合并更改的合并提交 [英] git-log missing merge commit that undid a change

查看:89
本文介绍了git-log缺少合并更改的合并提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑此测试脚本.

 #!/bin/sh -x

#initialize repository
rm -rf missing-merge-log
mkdir missing-merge-log
cd missing-merge-log
git init

# create files, x, y, and z
echo x > x
echo y > y
echo z > z
git add -A .
git commit -m "initial commit"

# create a branch
git branch branch

# change x and z on master
echo x2 > x
echo z2 > z
git commit -am "changed x to x2, z to z2"
git log master -- x

# change y and z on the branch
git checkout branch
echo y2 > y
echo z3 > z
git commit -am "changed y to y2, z to z3"

# merge master into branch
git merge master
# resolve z conflict
echo z23 > z
git add z
# undo changes to x during merge conflict resolution
# (imagine this was developer error)
git checkout branch -- x
git commit --no-edit

# merge branch into master
git checkout master
git merge branch

# now the x2 commit is entirely missing from the log
git log master -- x
 

我们首先创建三个文件xyz,然后创建一个名为branch的分支.在master中,将更改提交到xz,在分支中,将更改提交到yz.

然后,在分支中,我们从master进行合并,但是在解决合并冲突期间,我们将更改还原为x. (为此示例,请想象这是开发人员错误;开发人员无意拒绝对x所做的更改.)

最后,回到master,我们合并分支中的更改.

我希望此时git log x显示三个更改:初始提交,在master上更改为x以及将更改恢复为x的分支提交.

但是,相反,在脚本末尾,git log仅显示了对x的初始提交,没有任何迹象表明x曾经被修改过!这是使用git版本2.22.0.

为什么git log这样做? git log -- x是否有参数可以显示此处发生的情况? git log --all -- x没有帮助.

(git log --all确实显示了所有内容,但在现实生活中会显示所有文件的所有更改,包括对yz的不相关更改,这些更改很难通过.)

解决方案

TL; DR

使用--full-history-但您可能还需要更多选项,因此请继续阅读.

首先,非常感谢您的复制脚本!这在这里非常有用.

下一步:

(git log --all确实显示了所有内容,但在现实生活中会显示所有文件的所有更改,包括对yz的不相关更改,这些更改很难通过.)

是的.但是它表明,任何 commits都没有问题; 问题完全是由git log造成的.它与可怕的> 简化历史记录 有关模式,其中:

git log master -- x

调用.

git log,无需简化历史记录

让我添加以下内容的输出:

git log --all --decorate --oneline --graph

("git log在A DOG的帮助下"),由于我使用脚本进行了复制,因此与您(或其他执行其他复制操作的人)将具有不同的哈希ID,但具有相同的结构,因此我们可以讨论提交:

*   cc7285d (HEAD -> master, branch) Merge branch 'master' into branch
|\  
| * ad686b0 changed x to x2, z to z2
* | dcaa916 changed y to y2, z to z3
|/  
* a222cef initial commit

现在是普通的git log,没有-- x可以检查文件x不会启用历史记录简化功能. Git从您指定的提交开始,例如:

git log dcaa916

dcaa916开始-如果未指定任何内容,则从HEAD开始.

然后,在这种情况下,git log从提交cc7285d开始. Git显示该提交,然后继续该提交的父级.这里有两个父母-dcaa916ad686b0,因此Git将两者都放入a222cef,我们确保我们不会意外地两次显示a222cef(在其他问题中).队列中现在已经包含a222cef,并且没有其他内容,因此git loga222cef从队列中移出,显示a222cef,并将a222cef的父级放入队列中.在此复制者示例中,没有父母,因此队列保持为空,并且git log可以完成,这就是我们在常规git log中看到的内容.在DOG的帮助下,我们也获得了图形和单行输出变体.

git log具有简化的历史记录

Git没有文件历史记录.存储库中的历史记录由 commits 组成.但是git log会尽力显示文件历史记录.为了做到这一点,它必须综合一个,并且要做到 ,Git的作者选择简单地省略一些提交子集.该文档试图用一个句子的段落来解释这一点:

有时您只对历史的某些部分感兴趣,例如,修改特定< path>的提交.但是 History Simplification (历史简化)有两个部分,一个部分是选择提交,另一部分是如何进行提交,因为存在多种简化历史的策略.

我认为这一段落的解释是行不通的,但是我也没有想出我认为是 right 的解释. :-)他们试图在这里表达的是这样:

  • Git不会向您显示所有提交.这将显示一些选定的提交子集.

    这部分很合理.我们已经看到,即使没有简化历史记录:Git也以 last 提交开始,我们使用分支名称或HEAD或其他名称指定提交,然后向后工作,一次提交一次,必要时,一次将多个提交提交到其优先级队列中.

    使用历史记录简化功能,我们仍然使用优先级队列浏览提交图,但是对于许多提交,我们只是不显示提交.到目前为止还可以,但是现在Git陷入了扭曲,导致他们写了那段怪异的段落.

  • 如果Git不会向您显示所有提交,那么它可能会作弊,甚至不会费心地跟随一些分叉.

    这是很难表达的部分.当我们从分支尖端向后移到提交图根时,每个 merge 提交(其中两个提交流汇合在一起)成为一个分叉,其中两个提交流发散.特别地,commit cc7285d是合并,并且当我们进行历史简化时,Git总是将父母双方都放在队列中.但是,当我们做到进行简化历史记录时,Git有时不会将这些提交放入队列.

这里真正棘手的部分是确定哪些提交进入队列,这就是文档的更详细的说明"和 TREESAME 观念出现的地方.我鼓励人们仔细阅读它,因为它具有很多有用的信息,但是它包装得非常密集,并且一开始不是很擅长定义 TREESAME.该文档是这样写的:

假设您将foo指定为< paths>.我们将调用修改foo!TREESAME的提交,其余的调用TREESAME. (在针对foo进行过滤的差异搜索中,它们分别看起来不同且相等.)

此定义取决于提交是非合并提交!

所有提交都是快照(或更准确地说,是包含快照).因此,没有提交会单独修改 any 文件.它只是文件,或者没有文件.如果有文件,则它具有文件的某些特定内容.要将提交视为更改(作为一组修改),我们需要选择一些 other 提交,提取两个 提交,然后将两者进行比较.对于非合并提交,有一个明显的提交要使用:父提交.给定一些提交链:

...--F--G--H--...

通过提取GH并进行比较,我们将查看提交H已更改的内容.通过提取FG并进行比较,我们将看到G中的更改.这就是这里的TREESAME段落的含义:例如,我们提取FG,并去除所有您询问的文件.然后,我们比较其余文件.在简化的FG中它们是否相同?如果是这样,FG是TREESAME.如果不是,则不是.

但是,根据定义,合并提交至少具有两个父级:

...--K
      \
       M
      /
...--L

如果我们正在合并提交M,我们会选择哪个父级来确定什么是TREESAME,什么不是?

Git的答案是一次比较所有父母的 all 的提交.一些比较可能会导致"is TREESAME",而其他一些可能会导致"is not TREESAME".例如,M中的文件foo可能与K中的文件foo和/或L中的文件foo相匹配.

Git使用哪种提交取决于您提供给git log的其他选项:

默认模式

如果不是对任何父级的TREESAME,则包括提交(尽管可以更改,请参见下面的--sparse).如果提交是合并,并且对一个父对象是TREESAME,则仅遵循该父对象. (即使有几个TREESAME父母,也只能跟随其中一个.)否则,请跟随所有父母.

因此,我们考虑合并cc7285d,并将其与其(两个)父母中的每一个进行比较:

$ git diff --name-status cc7285d^1 cc7285d
M       z
$ git diff --name-status cc7285d^2 cc7285d
M       x
M       y
M       z

这意味着git log将只走第一个父级,并提交cc7285d^1(它是dcaa916),这是的那个更改x:

...如果提交是合并,并且对一个父级是TREESAME,则仅遵循该父级. ...

所以 this git log先执行cc7285d,然后再提交dcaa916,然后再提交a222cef,然后停止.它根本不会查看提交cc7285d^2(它是ad686b0).

git log文档的本节的其余部分描述了选项--full-history--dense--sparse--simplify-merges(甚至我也不明白最后一个选项的真正目的:- )).在所有这些中,--full-history是最明显的,并且可以完成您想要的操作. (--ancestry-path--simplify-by-decoration也是本节,但它们不会影响合并时的路径.)

注意事项

虽然--full-history将确保Git遍历每个合并的所有分支",但是git log -p本身默认情况下对合并提交显示 no 差异.您必须添加三个选项之一--c--cc-m,以使git log -p完全显示任何合并的差异.

如果您的目标是专门找到一个 bad 两亲合并,而该合并会丢弃应该保留的某些特定更改,则您可能希望显示该合并中的差异到至少一个,也可能是两个父母的两个. git show命令将执行此操作,但是其默认值为--cc样式. git log命令完全不会执行此操作.如果将--cc添加到git log,则将得到与git show默认显示的相同的差异—也不起作用.

--cc-c选项告诉Git,当查看合并提交时,Git应该将提交与所有父项进行比较,然后生成 summary diff,而不是详细的一.摘要的内容不包括与一个或所有父母匹配的部分.您正在寻找一个意外删除重要更改的合并-与它的父对象中的至少一个相同且与该父对象不同的合并.这个组合的差异会隐藏不是但应该更改的地方.因此,您不要想要-c--cc.

留下-m选项.当git showgit log要显示差异时,并且提交是合并提交时,Git将显示每个父对象一个差异.也就是说,对于像M的合并提交,git show -m将首先比较KM并显示差异.然后它将比较LM并显示另一个差异.在特定情况下,这就是您想要的选项.

请注意,-m--first-parent很好地结合在一起,以仅显示每个合并的第一个父对象的完整差异.通常,这正是您想要的.

Consider this test script.

#!/bin/sh -x

#initialize repository
rm -rf missing-merge-log
mkdir missing-merge-log
cd missing-merge-log
git init

# create files, x, y, and z
echo x > x
echo y > y
echo z > z
git add -A .
git commit -m "initial commit"

# create a branch
git branch branch

# change x and z on master
echo x2 > x
echo z2 > z
git commit -am "changed x to x2, z to z2"
git log master -- x

# change y and z on the branch
git checkout branch
echo y2 > y
echo z3 > z
git commit -am "changed y to y2, z to z3"

# merge master into branch
git merge master
# resolve z conflict
echo z23 > z
git add z
# undo changes to x during merge conflict resolution
# (imagine this was developer error)
git checkout branch -- x
git commit --no-edit

# merge branch into master
git checkout master
git merge branch

# now the x2 commit is entirely missing from the log
git log master -- x

We first create three files, x, y, and z, and create a branch named branch. In master, we commit a change to x and z, and in the branch, we commit a change to y and z.

Then, in the branch, we merge from master, but during merge conflict resolution, we revert the change to x. (For the sake of this example, imagine that this was a developer error; the developer didn't intend to reject the changes to x.)

Finally, back in master, we merge the changes from the branch.

I would expect at this point for git log x to show three changes: the initial commit, the change to x on master, and the branch commit that reverted the changes to x.

But instead, at the end of the script, git log just shows the initial commit to x, giving no indication that x had ever been modified! This using git version 2.22.0.

Why is git log doing this? Are there parameters to git log -- x that would show what happened here? git log --all -- x doesn't help.

(git log --all does show everything, but in real life that would show all changes to all files, including irrelevant changes to y and z, which would be too difficult to wade through.)

解决方案

TL;DR

Use --full-history—but you probably want more options too, so read on.

Long

First, many thanks for the reproducer script! That was very useful here.

Next:

(git log --all does show everything, but in real life that would show all changes to all files, including irrelevant changes to y and z, which would be too difficult to wade through.)

Yes. But it demonstrates that there's no issue with any of the commits; the problem is entirely of git log's making, here. It has to do with the dreaded History Simplification mode, which:

git log master -- x

invokes.

git log without History Simplification

Let me add the output from:

git log --all --decorate --oneline --graph

("git log with help from A DOG"), which since I did a reproduction using the script will have different hash IDs than you (or anyone else doing another repro) will have, but has the same structure, and thus lets us talk about the commits:

*   cc7285d (HEAD -> master, branch) Merge branch 'master' into branch
|\  
| * ad686b0 changed x to x2, z to z2
* | dcaa916 changed y to y2, z to z3
|/  
* a222cef initial commit

Now, a normal git log, without -- x to inspect file x, does not turn on history simplification. Git starts at the commit you specify—for instance:

git log dcaa916

starts at dcaa916—or at HEAD if you did not specify anything.

In this case, then, git log starts with commit cc7285d. Git shows that commit, then moves on to that commit's parent(s). Here there are two parents—dcaa916 and ad686b0—so Git places both commits into a priority queue. Then it pulls one of the commits from the head of the queue. When I try this, the one it pulls out is dcaa916. (In more realistic graphs, it will by default use the one with the later committer timestamp, but having built this repository with a script, both commits have the same timestamp.) Git shows that commit and places dcaa916's parent a222cef into the queue. For topological sanity, given this particular graph, the commit at the front of the queue is now always going to be ad686b0, so Git shows that commit and then....

Well, now, the parent of ad686b0 is a222cef, but a222cef is already in the queue! This is where that "for topological sanity" thing comes in. By not showing a222cef too early we make sure that we don't accidentally show a222cef twice (among other issues). The queue now has a222cef in it, and nothing else, so git log takes a222cef off the queue, shows a222cef, and puts a222cef's parents in the queue. In this reproducer-example there are no parents, so the queue remains empty, and git log can finish, and that's just what we see with a regular git log. With help from A DOG, we get the graph too, and the one-line output variant.

git log with History Simplification

Git doesn't have file history. The history in a repository consists of commits. But git log will do its best to show a file history. To do that, it has to synthesize one, and to do that, Git's authors chose to simply omit some subset of commits. The documentation tries to explain that with a one-sentence paragraph:

Sometimes you are only interested in parts of the history, for example the commits modifying a particular <path>. But there are two parts of History Simplification, one part is selecting the commits and the other is how to do it, as there are various strategies to simplify the history.

I think this one-paragraph explanation just doesn't work, but I have not yet come up with what I think is the right explanation, either. :-) What they are trying to express here is this:

  • Git isn't going to show you all the commits. It's going to show some selected subset of commits.

    This part makes perfect sense. We already see that even without History Simplification: Git starts with the last commit, the one we specify with a branch name or with HEAD or whatever, and then works backwards, one commit at a time, placing more than one commit at a time into its priority queue if and when necessary.

    With History Simplification, we still walk the commit graph using a priority queue, but for many commits, we're just not going to show the commit. OK so far—but now Git throws in the twist that led them to write that weird paragraph.

  • If Git isn't going to show you all commits, maybe it can cheat and not even bother to follow some forks.

    This is the hard part to express. When we work backwards from branch-tip towards the commit-graph root, every merge commit, where two streams of commits join up, becomes a fork, where two streams of commits diverge. In particular, commit cc7285d is a merge, and when we don't have History Simplification happening, Git always puts both parents into the queue. But when we do have History Simplification happening, Git sometimes doesn't put these commits into the queue.

The really tricky part here is deciding which commits get into the queue, and that's where the documentation's "more detailed explanation" and TREESAME notion come in. I encourage people to read through it, because it has a lot of good information, but it's very densely packed and is not very good at defining TREESAME in the first place. The documentation puts it this way:

Suppose you specified foo as the <paths>. We shall call commits that modify foo !TREESAME, and the rest TREESAME. (In a diff filtered for foo, they look different and equal, respectively.)

This definition depends on the commit being a non-merge commit!

All commits are snapshots (or more correctly, contain snapshots). So no commit, taken on its own, modifies any file. It just has the file, or doesn't have the file. If it has the file, it has some particular content for the file. To view a commit as a change—as a set of modifications—we need to pick some other commit, extract both commits, and then compare the two. For non-merge commits, there's an obvious commit to use: the parent. Given some chain of commits:

...--F--G--H--...

we'll see what's changed in commit H by extracting both G and H, and comparing them. We'll see what's changed in G by extracting F and G, and comparing them. That's what the TREESAME paragraph here is about: we take F and G, say, and strip out all but the files you asked about. Then we compare the remaining files. Are they the same in the stripped-down F and G? If so, F and G are TREESAME. If not, they're not.

But merge commits have, by definition, at least two parents:

...--K
      \
       M
      /
...--L

If we're at merge commit M, which parent do we pick to determine what's TREESAME and what's not?

Git's answer is to compare the commit to all of the parents, one at a time. Some comparisons may result in "is TREESAME", and others may result in "is not TREESAME". For instance, file foo in M may match file foo in K and/or file foo in L.

Which commits Git uses depend on the additional options you supply to git log:

Default mode

Commits are included if they are not TREESAME to any parent (though this can be changed, see --sparse below). If the commit was a merge, and it was TREESAME to one parent, follow only that parent. (Even if there are several TREESAME parents, follow only one of them.) Otherwise, follow all parents.

So let's consider merge cc7285d, and compare it to each of its (two) parents:

$ git diff --name-status cc7285d^1 cc7285d
M       z
$ git diff --name-status cc7285d^2 cc7285d
M       x
M       y
M       z

This means that git log will walk only the first parent, commit cc7285d^1 (which is dcaa916)—this is the one that doesn't change x:

... If the commit was a merge, and it was TREESAME to one parent, follow only that parent. ...

So this git log walks commit cc7285d, then commit dcaa916, then commit a222cef, and then stops. It never looks at commit cc7285d^2 (which is ad686b0) at all.

The rest of this section of the git log documentation describes the options --full-history, --dense, --sparse, and --simplify-merges (and even I don't understand the true purpose of the last option :-) ). Of all of these, --full-history is the most obvious and will do what you want. (--ancestry-path and --simplify-by-decoration are this section as well but they don't affect paths at merges.)

Caveats

While --full-history will make sure that Git walks through all "legs" of each merge, git log -p itself by default shows no diffs for merge commits. You must add one of three options—-c, --cc, or -m—to make git log -p show any diff at all for any merge.

If your goal is specifically to find a bad two-parent merge, one that drops some particular change that should have been retained, you probably want to show the diff from that merge to at least one, and perhaps both, of its two parents. The git show command will do this, but its default is --cc style. The git log command won't do it at all. If you add --cc to your git log, you'll get the same diff that git show would show by default—and that's not going to work either.

The --cc or -c options tell Git that, when looking at a merge commit, Git should diff the commit against all the parents, then produce a summary diff, rather than a detailed one. The contents of the summary exclude parts that match one or all parents. You're looking for a merge that accidentally dropped an important change—a merge that is the same as at least one of its parents, when it should be different from that parent. This combined diff is going to hide the place where the change isn't-but-should-be. So you don't want -c or --cc.

That leaves the -m option. When git show or git log is going to show a diff, and the commit is a merge commit, Git will show one diff per parent. That is, for a merge commit like M, git show -m will first compare K vs M and show that diff. Then it will compare L vs M and show the other diff. That's the option you want here, for this particular case.

Note that -m combines nicely with --first-parent to show only the full diff against the first parent of each merge. Often that's exactly what you want.

这篇关于git-log缺少合并更改的合并提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆