git-log缺少合并更改的合并提交 [英] git-log missing merge commit that undid a change
问题描述
考虑此测试脚本.
#!/bin/sh -x
#initialize repository
rm -rf missing-merge-log
mkdir missing-merge-log
cd missing-merge-log
git init
# create files, x, y, and z
echo x > x
echo y > y
echo z > z
git add -A .
git commit -m "initial commit"
# create a branch
git branch branch
# change x and z on master
echo x2 > x
echo z2 > z
git commit -am "changed x to x2, z to z2"
git log master -- x
# change y and z on the branch
git checkout branch
echo y2 > y
echo z3 > z
git commit -am "changed y to y2, z to z3"
# merge master into branch
git merge master
# resolve z conflict
echo z23 > z
git add z
# undo changes to x during merge conflict resolution
# (imagine this was developer error)
git checkout branch -- x
git commit --no-edit
# merge branch into master
git checkout master
git merge branch
# now the x2 commit is entirely missing from the log
git log master -- x
我们首先创建三个文件x
,y
和z
,然后创建一个名为branch
的分支.在master
中,将更改提交到x
和z
,在分支中,将更改提交到y
和z
.
然后,在分支中,我们从master
进行合并,但是在解决合并冲突期间,我们将更改还原为x
. (为此示例,请想象这是开发人员错误;开发人员无意拒绝对x
所做的更改.)
最后,回到master
,我们合并分支中的更改.
我希望此时git log x
显示三个更改:初始提交,在master上更改为x
以及将更改恢复为x
的分支提交.
但是,相反,在脚本末尾,git log
仅显示了对x的初始提交,没有任何迹象表明x
曾经被修改过!这是使用git版本2.22.0.
为什么git log
这样做? git log -- x
是否有参数可以显示此处发生的情况? git log --all -- x
没有帮助.
(git log --all
确实显示了所有内容,但在现实生活中会显示所有文件的所有更改,包括对y
和z
的不相关更改,这些更改很难通过.)
TL; DR
使用--full-history
-但您可能还需要更多选项,因此请继续阅读.
长
首先,非常感谢您的复制脚本!这在这里非常有用.
下一步:
(
git log --all
确实显示了所有内容,但在现实生活中会显示所有文件的所有更改,包括对y
和z
的不相关更改,这些更改很难通过.)
是的.但是它表明,任何 commits都没有问题; 问题完全是由git log
造成的.它与可怕的> 简化历史记录 有关模式,其中:
git log master -- x
调用.
git log
,无需简化历史记录
让我添加以下内容的输出:
git log --all --decorate --oneline --graph
("git log在A DOG的帮助下"),由于我使用脚本进行了复制,因此与您(或其他执行其他复制操作的人)将具有不同的哈希ID,但具有相同的结构,因此我们可以讨论提交:
* cc7285d (HEAD -> master, branch) Merge branch 'master' into branch
|\
| * ad686b0 changed x to x2, z to z2
* | dcaa916 changed y to y2, z to z3
|/
* a222cef initial commit
现在是普通的git log
,没有-- x
可以检查文件x
,不会启用历史记录简化功能. Git从您指定的提交开始,例如:
git log dcaa916
从dcaa916
开始-如果未指定任何内容,则从HEAD
开始.
然后,在这种情况下,git log
从提交cc7285d
开始. Git显示该提交,然后继续该提交的父级.这里有两个父母-dcaa916
和ad686b0
,因此Git将两者都放入a222cef,我们确保我们不会意外地两次显示a222cef
(在其他问题中).队列中现在已经包含a222cef
,并且没有其他内容,因此git log
将a222cef
从队列中移出,显示a222cef
,并将a222cef
的父级放入队列中.在此复制者示例中,没有父母,因此队列保持为空,并且git log
可以完成,这就是我们在常规git log
中看到的内容.在DOG的帮助下,我们也获得了图形和单行输出变体.
git log
具有简化的历史记录
Git没有文件历史记录.存储库中的历史记录由 commits 组成.但是git log
会尽力显示文件历史记录.为了做到这一点,它必须综合一个,并且要做到 ,Git的作者选择简单地省略一些提交子集.该文档试图用一个句子的段落来解释这一点:
有时您只对历史的某些部分感兴趣,例如,修改特定< path>的提交.但是 History Simplification (历史简化)有两个部分,一个部分是选择提交,另一部分是如何进行提交,因为存在多种简化历史的策略.
我认为这一段落的解释是行不通的,但是我也没有想出我认为是 right 的解释. :-)他们试图在这里表达的是这样:
-
Git不会向您显示所有提交.这将显示一些选定的提交子集.
这部分很合理.我们已经看到,即使没有简化历史记录:Git也以 last 提交开始,我们使用分支名称或
HEAD
或其他名称指定提交,然后向后工作,一次提交一次,必要时,一次将多个提交提交到其优先级队列中.使用历史记录简化功能,我们仍然使用优先级队列浏览提交图,但是对于许多提交,我们只是不显示提交.到目前为止还可以,但是现在Git陷入了扭曲,导致他们写了那段怪异的段落.
-
如果Git不会向您显示所有提交,那么它可能会作弊,甚至不会费心地跟随一些分叉.
这是很难表达的部分.当我们从分支尖端向后移到提交图根时,每个 merge 提交(其中两个提交流汇合在一起)成为一个分叉,其中两个提交流发散.特别地,commit
cc7285d
是合并,并且当我们不进行历史简化时,Git总是将父母双方都放在队列中.但是,当我们做到进行简化历史记录时,Git有时不会将这些提交放入队列.
这里真正棘手的部分是确定哪些提交进入队列,这就是文档的更详细的说明"和 TREESAME 观念出现的地方.我鼓励人们仔细阅读它,因为它具有很多有用的信息,但是它包装得非常密集,并且一开始不是很擅长定义 TREESAME.该文档是这样写的:
假设您将
foo
指定为< paths>.我们将调用修改foo
!TREESAME的提交,其余的调用TREESAME. (在针对foo
进行过滤的差异搜索中,它们分别看起来不同且相等.)
此定义取决于提交是非合并提交!
所有提交都是快照(或更准确地说,是包含快照).因此,没有提交会单独修改 any 文件.它只是有文件,或者没有文件.如果有文件,则它具有文件的某些特定内容.要将提交视为更改(作为一组修改),我们需要选择一些 other 提交,提取两个 提交,然后将两者进行比较.对于非合并提交,有一个明显的提交要使用:父提交.给定一些提交链:
...--F--G--H--...
通过提取G
和H
并进行比较,我们将查看提交H
中已更改的内容.通过提取F
和G
并进行比较,我们将看到G
中的更改.这就是这里的TREESAME段落的含义:例如,我们提取F
和G
,并去除所有您询问的文件.然后,我们比较其余文件.在简化的F
和G
中它们是否相同?如果是这样,F
和G
是TREESAME.如果不是,则不是.
但是,根据定义,合并提交至少具有两个父级:
...--K
\
M
/
...--L
如果我们正在合并提交M
,我们会选择哪个父级来确定什么是TREESAME,什么不是?
Git的答案是一次比较所有父母的 all 的提交.一些比较可能会导致"is TREESAME",而其他一些可能会导致"is not TREESAME".例如,M
中的文件foo
可能与K
中的文件foo
和/或L
中的文件foo
相匹配.
Git使用哪种提交取决于您提供给git log
的其他选项:
默认模式
如果不是对任何父级的TREESAME,则包括提交(尽管可以更改,请参见下面的
--sparse
).如果提交是合并,并且对一个父对象是TREESAME,则仅遵循该父对象. (即使有几个TREESAME父母,也只能跟随其中一个.)否则,请跟随所有父母.
因此,我们考虑合并cc7285d
,并将其与其(两个)父母中的每一个进行比较:
$ git diff --name-status cc7285d^1 cc7285d
M z
$ git diff --name-status cc7285d^2 cc7285d
M x
M y
M z
这意味着git log
将只走第一个父级,并提交cc7285d^1
(它是dcaa916
),这是不的那个更改x
:
...如果提交是合并,并且对一个父级是TREESAME,则仅遵循该父级. ...
所以 this git log
先执行cc7285d
,然后再提交dcaa916
,然后再提交a222cef
,然后停止.它根本不会查看提交cc7285d^2
(它是ad686b0
).
git log
文档的本节的其余部分描述了选项--full-history
,--dense
,--sparse
和--simplify-merges
(甚至我也不明白最后一个选项的真正目的:- )).在所有这些中,--full-history
是最明显的,并且可以完成您想要的操作. (--ancestry-path
和--simplify-by-decoration
也是本节,但它们不会影响合并时的路径.)
注意事项
虽然--full-history
将确保Git遍历每个合并的所有分支",但是git log -p
本身默认情况下对合并提交显示 no 差异.您必须添加三个选项之一--c
,--cc
或-m
,以使git log -p
完全显示任何合并的差异.
如果您的目标是专门找到一个 bad 两亲合并,而该合并会丢弃应该保留的某些特定更改,则您可能希望显示该合并中的差异到至少一个,也可能是两个父母的两个. git show
命令将执行此操作,但是其默认值为--cc
样式. git log
命令完全不会执行此操作.如果将--cc
添加到git log
,则将得到与git show
默认显示的相同的差异—也不起作用.
--cc
或-c
选项告诉Git,当查看合并提交时,Git应该将提交与所有父项进行比较,然后生成 summary diff,而不是详细的一.摘要的内容不包括与一个或所有父母匹配的部分.您正在寻找一个意外删除重要更改的合并-与它的父对象中的至少一个相同且与该父对象不同的合并.这个组合的差异会隐藏不是但应该更改的地方.因此,您不要想要-c
或--cc
.
留下-m
选项.当git show
或git log
要显示差异时,并且提交是合并提交时,Git将显示每个父对象一个差异.也就是说,对于像M
的合并提交,git show -m
将首先比较K
与M
并显示差异.然后它将比较L
与M
并显示另一个差异.在特定情况下,这就是您想要的选项.
请注意,-m
与--first-parent
很好地结合在一起,以仅显示每个合并的第一个父对象的完整差异.通常,这正是您想要的.
Consider this test script.
#!/bin/sh -x
#initialize repository
rm -rf missing-merge-log
mkdir missing-merge-log
cd missing-merge-log
git init
# create files, x, y, and z
echo x > x
echo y > y
echo z > z
git add -A .
git commit -m "initial commit"
# create a branch
git branch branch
# change x and z on master
echo x2 > x
echo z2 > z
git commit -am "changed x to x2, z to z2"
git log master -- x
# change y and z on the branch
git checkout branch
echo y2 > y
echo z3 > z
git commit -am "changed y to y2, z to z3"
# merge master into branch
git merge master
# resolve z conflict
echo z23 > z
git add z
# undo changes to x during merge conflict resolution
# (imagine this was developer error)
git checkout branch -- x
git commit --no-edit
# merge branch into master
git checkout master
git merge branch
# now the x2 commit is entirely missing from the log
git log master -- x
We first create three files, x
, y
, and z
, and create a branch named branch
. In master
, we commit a change to x
and z
, and in the branch, we commit a change to y
and z
.
Then, in the branch, we merge from master
, but during merge conflict resolution, we revert the change to x
. (For the sake of this example, imagine that this was a developer error; the developer didn't intend to reject the changes to x
.)
Finally, back in master
, we merge the changes from the branch.
I would expect at this point for git log x
to show three changes: the initial commit, the change to x
on master, and the branch commit that reverted the changes to x
.
But instead, at the end of the script, git log
just shows the initial commit to x, giving no indication that x
had ever been modified! This using git version 2.22.0.
Why is git log
doing this? Are there parameters to git log -- x
that would show what happened here? git log --all -- x
doesn't help.
(git log --all
does show everything, but in real life that would show all changes to all files, including irrelevant changes to y
and z
, which would be too difficult to wade through.)
TL;DR
Use --full-history
—but you probably want more options too, so read on.
Long
First, many thanks for the reproducer script! That was very useful here.
Next:
(
git log --all
does show everything, but in real life that would show all changes to all files, including irrelevant changes toy
andz
, which would be too difficult to wade through.)
Yes. But it demonstrates that there's no issue with any of the commits; the problem is entirely of git log
's making, here. It has to do with the dreaded History Simplification mode, which:
git log master -- x
invokes.
git log
without History Simplification
Let me add the output from:
git log --all --decorate --oneline --graph
("git log with help from A DOG"), which since I did a reproduction using the script will have different hash IDs than you (or anyone else doing another repro) will have, but has the same structure, and thus lets us talk about the commits:
* cc7285d (HEAD -> master, branch) Merge branch 'master' into branch
|\
| * ad686b0 changed x to x2, z to z2
* | dcaa916 changed y to y2, z to z3
|/
* a222cef initial commit
Now, a normal git log
, without -- x
to inspect file x
, does not turn on history simplification. Git starts at the commit you specify—for instance:
git log dcaa916
starts at dcaa916
—or at HEAD
if you did not specify anything.
In this case, then, git log
starts with commit cc7285d
. Git shows that commit, then moves on to that commit's parent(s). Here there are two parents—dcaa916
and ad686b0
—so Git places both commits into a priority queue. Then it pulls one of the commits from the head of the queue. When I try this, the one it pulls out is dcaa916
. (In more realistic graphs, it will by default use the one with the later committer timestamp, but having built this repository with a script, both commits have the same timestamp.) Git shows that commit and places dcaa916
's parent a222cef
into the queue. For topological sanity, given this particular graph, the commit at the front of the queue is now always going to be ad686b0
, so Git shows that commit and then....
Well, now, the parent of ad686b0
is a222cef
, but a222cef
is already in the queue! This is where that "for topological sanity" thing comes in. By not showing a222cef
too early we make sure that we don't accidentally show a222cef
twice (among other issues). The queue now has a222cef
in it, and nothing else, so git log
takes a222cef
off the queue, shows a222cef
, and puts a222cef
's parents in the queue. In this reproducer-example there are no parents, so the queue remains empty, and git log
can finish, and that's just what we see with a regular git log
. With help from A DOG, we get the graph too, and the one-line output variant.
git log
with History Simplification
Git doesn't have file history. The history in a repository consists of commits. But git log
will do its best to show a file history. To do that, it has to synthesize one, and to do that, Git's authors chose to simply omit some subset of commits. The documentation tries to explain that with a one-sentence paragraph:
Sometimes you are only interested in parts of the history, for example the commits modifying a particular <path>. But there are two parts of History Simplification, one part is selecting the commits and the other is how to do it, as there are various strategies to simplify the history.
I think this one-paragraph explanation just doesn't work, but I have not yet come up with what I think is the right explanation, either. :-) What they are trying to express here is this:
Git isn't going to show you all the commits. It's going to show some selected subset of commits.
This part makes perfect sense. We already see that even without History Simplification: Git starts with the last commit, the one we specify with a branch name or with
HEAD
or whatever, and then works backwards, one commit at a time, placing more than one commit at a time into its priority queue if and when necessary.With History Simplification, we still walk the commit graph using a priority queue, but for many commits, we're just not going to show the commit. OK so far—but now Git throws in the twist that led them to write that weird paragraph.
If Git isn't going to show you all commits, maybe it can cheat and not even bother to follow some forks.
This is the hard part to express. When we work backwards from branch-tip towards the commit-graph root, every merge commit, where two streams of commits join up, becomes a fork, where two streams of commits diverge. In particular, commit
cc7285d
is a merge, and when we don't have History Simplification happening, Git always puts both parents into the queue. But when we do have History Simplification happening, Git sometimes doesn't put these commits into the queue.
The really tricky part here is deciding which commits get into the queue, and that's where the documentation's "more detailed explanation" and TREESAME notion come in. I encourage people to read through it, because it has a lot of good information, but it's very densely packed and is not very good at defining TREESAME in the first place. The documentation puts it this way:
Suppose you specified
foo
as the <paths>. We shall call commits that modifyfoo
!TREESAME, and the rest TREESAME. (In a diff filtered forfoo
, they look different and equal, respectively.)
This definition depends on the commit being a non-merge commit!
All commits are snapshots (or more correctly, contain snapshots). So no commit, taken on its own, modifies any file. It just has the file, or doesn't have the file. If it has the file, it has some particular content for the file. To view a commit as a change—as a set of modifications—we need to pick some other commit, extract both commits, and then compare the two. For non-merge commits, there's an obvious commit to use: the parent. Given some chain of commits:
...--F--G--H--...
we'll see what's changed in commit H
by extracting both G
and H
, and comparing them. We'll see what's changed in G
by extracting F
and G
, and comparing them. That's what the TREESAME paragraph here is about: we take F
and G
, say, and strip out all but the files you asked about. Then we compare the remaining files. Are they the same in the stripped-down F
and G
? If so, F
and G
are TREESAME. If not, they're not.
But merge commits have, by definition, at least two parents:
...--K
\
M
/
...--L
If we're at merge commit M
, which parent do we pick to determine what's TREESAME and what's not?
Git's answer is to compare the commit to all of the parents, one at a time. Some comparisons may result in "is TREESAME", and others may result in "is not TREESAME". For instance, file foo
in M
may match file foo
in K
and/or file foo
in L
.
Which commits Git uses depend on the additional options you supply to git log
:
Default mode
Commits are included if they are not TREESAME to any parent (though this can be changed, see
--sparse
below). If the commit was a merge, and it was TREESAME to one parent, follow only that parent. (Even if there are several TREESAME parents, follow only one of them.) Otherwise, follow all parents.
So let's consider merge cc7285d
, and compare it to each of its (two) parents:
$ git diff --name-status cc7285d^1 cc7285d
M z
$ git diff --name-status cc7285d^2 cc7285d
M x
M y
M z
This means that git log
will walk only the first parent, commit cc7285d^1
(which is dcaa916
)—this is the one that doesn't change x
:
... If the commit was a merge, and it was TREESAME to one parent, follow only that parent. ...
So this git log
walks commit cc7285d
, then commit dcaa916
, then commit a222cef
, and then stops. It never looks at commit cc7285d^2
(which is ad686b0
) at all.
The rest of this section of the git log
documentation describes the options --full-history
, --dense
, --sparse
, and --simplify-merges
(and even I don't understand the true purpose of the last option :-) ). Of all of these, --full-history
is the most obvious and will do what you want. (--ancestry-path
and --simplify-by-decoration
are this section as well but they don't affect paths at merges.)
Caveats
While --full-history
will make sure that Git walks through all "legs" of each merge, git log -p
itself by default shows no diffs for merge commits. You must add one of three options—-c
, --cc
, or -m
—to make git log -p
show any diff at all for any merge.
If your goal is specifically to find a bad two-parent merge, one that drops some particular change that should have been retained, you probably want to show the diff from that merge to at least one, and perhaps both, of its two parents. The git show
command will do this, but its default is --cc
style. The git log
command won't do it at all. If you add --cc
to your git log
, you'll get the same diff that git show
would show by default—and that's not going to work either.
The --cc
or -c
options tell Git that, when looking at a merge commit, Git should diff the commit against all the parents, then produce a summary diff, rather than a detailed one. The contents of the summary exclude parts that match one or all parents. You're looking for a merge that accidentally dropped an important change—a merge that is the same as at least one of its parents, when it should be different from that parent. This combined diff is going to hide the place where the change isn't-but-should-be. So you don't want -c
or --cc
.
That leaves the -m
option. When git show
or git log
is going to show a diff, and the commit is a merge commit, Git will show one diff per parent. That is, for a merge commit like M
, git show -m
will first compare K
vs M
and show that diff. Then it will compare L
vs M
and show the other diff. That's the option you want here, for this particular case.
Note that -m
combines nicely with --first-parent
to show only the full diff against the first parent of each merge. Often that's exactly what you want.
这篇关于git-log缺少合并更改的合并提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!