如何通过添加文件索引来查找责任(blob) [英] How to find commit responsible by adding a file index (blob)

查看:143
本文介绍了如何通过添加文件索引来查找责任(blob)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我们创建一个 git diff Version1..Version2 - file 时,这个命令将会返回类似于:

diff --git a / wp-includes / version.php b / wp-includes / version.php
index 5d034bb9d8..617021e8d9 100644



git在这里比较两个版本的文件之间的区别。
我需要从索引号 5d034bb9d8 以及索引** 617021e8d9 *。

解决方案

TL; DR



这个(未经测试的)脚本可以做你想做的。阅读其余部分,了解它的工作原理,工作时间和工作时间,以及注意事项。

 #! / bin / sh 
case $#in
2);;
*)echousage:script-specifier right-specifier1& 2; 1号出口;;
esac
#将参数转化为散列,然后确保它们提交
L = $(git rev-parse$ 1)||退出
R = $(git rev-parse$ 2)||退出
L = $(git rev-parse $ L ^ {commit})||退出
R = $(git rev-parse $ R ^ {commit})||退出

haveblob = $(git rev-parse $ L:wp-includes / version.php)||退出
wantblob = $(git rev-parse $ R:wp-includes / version.php)||退出
git rev-list --reverse --topo-order $ R ^ $ L ^ @ |同时读取散列;
thisblob = $(git rev-parse $ hash:wp-includes / version.php)
test $ thisblob = $ haveblob&&如果[$ thisblob = $ wantblob];继续
;那么
echo目标文件出现在提交$ hash中
exit 0#我们发现它 - 成功并退出
fi
echonote:commit $ hash contains a different版本比任何一端
完成
echo错误:到循环底部
exit 1



Long



让我们再澄清一点:您已经运行:

  $ git diff< commit1> < commit2> -  wp-includes / version.php 

及其输出部分:

  index 5d034bb9d8..617021e8d9 100644 

让我们调用< commit1> - 这是您通过散列或标记或分支名称或任何其他 指定的,其中 L 代表 git diff 的左侧。让我们在右侧调用第二个提交 R



你想要找到一些在 L 之前或 R ,其中文件 wp-includes / version.php 匹配 R 中的版本>,即缩写为 617021e8d9 的那个。但是你不想只提供任何提交:你希望第一个这样的提交 - 最接近 L 的那个提交。
$ b

值得注意的是,首先,两次提交之间可能没有明显的关系。也就是说,如果我们要绘制提交历史记录的图表,它可能很简单:

  ...  -  o  -  o  -  L  -  M  -  N  -  ...  -  Q  -  R  -  o  -  o  -  o < - 分支

但它可能并不那么简单。现在,我们假设它很简单。



简单情况: L L R R ,并且在



在这种情况下,从 L R 获得一些直接的因果关系。你的问题的答案会很有意义。具体来说,它回答了这个问题:这个版本从哪里来的? 有一行直接以 L 结尾的< R
,并且 R 中的版本也可能在之前的提交中。让我们来看看如何在 L - 到 - R 序列中找到最早的提交,在 R 中。



相同的版本首先请注意,每次提交都代表所有快照该快照中的文件。也就是说,如果我们看一下上面提交的 N ,它就会以某种形式存在所有文件。 N 中的 wp-includes / version.php 副本可能与 L中的< 或可能匹配 R 中的那个。 (这显然不能与两者匹配:如果是这样的话, L 中的那个会匹配 R 中的那个,是 index 行和没有差异输出。)



文件可能在 L R ,但不在任何之间的提交,但在这种情况下,答案是:文件首先出现在 R 中。



文件也可能是在 L R 某些中,而不是全部,中间提交:说 L 有它,然后它在 M 中被移除,然后它又出现在 N ,它在 R 中的形式,然后再次在 O 等等。所以它存在于 L N P 中,并且 - [R ;它在 M O Q 中缺失。现在这个问题更加困难:你是否想在 N 中看到它,即使它在 O 中再次出现?或者你是否想在 R 中看到它,因为它在 Q 中缺少?



在任何情况下,我们需要枚举 L R 。因此,我们将从以下开始:

  git rev-list L..R 

(这会省略 L ,这有点令人讨厌)。 Git会按照逆向排列的顺序来枚举它们;因为我们知道链是线性的,所以这实际上是直线倒序。 (以后我们将看到如何为更复杂的情况执行明智的订​​单。)为了检查 L 本身,我们可以明确地添加它:

 (git rev-list L..R; git rev-parse L)

或者我们可以使用相当复杂的技巧:

$ p $ lhash = $(git rev-parse L); git rev-list R ^ $ {lhash} ^ @

(详情请参阅 gitrevisions文档)。简单一点:

  git rev-list L ^ .. R 

通常也是如此:只有当 L 是根提交时才会失败。



在任何情况下, git rev-list 的输出都是一堆提交哈希ID:提交 R ,然后是提交 Q ,然后是提交 P 等等一直回到 L 。因此,我们将通过命令来管理这个 git rev-list 的输出以找出我们特定的blob来自哪里。但我们希望以其他顺序访问提交: L first,然后 M ,然后 N ,一直到 R 。因此,我们在 git rev-list 参数中添加 - reverse



其余部分假设我们在 sh bash 或类似文件中编写脚本。在我们运行 git rev-list 之前,让我们获得每个版本文件的完整blob-hash。然后我们会让他们进入循环:

 #! / bin / sh 
case $#in
2);;
*)echousage:script-specifier right-specifier1& 2; 1号出口;;
esac
#将参数转化为散列,然后确保它们提交
L = $(git rev-parse$ 1)||退出
R = $(git rev-parse$ 2)||退出
L = $(git rev-parse $ L ^ {commit})||退出
R = $(git rev-parse $ R ^ {commit})||退出

#获取blob哈希值,如果它们不存在则退出
haveblob = $(git rev-parse $ L:wp-includes / version.php)退出
wantblob = $(git rev-parse $ R:wp-includes / version.php)||退出
git rev-list --reverse $ R ^ $ L ^ @ |同时读取散列;做
...
完成

在循环中,让我们得到blob这个提交的散列:

$ $ $ $ $ $ $ $ $ $ $ $ $($ g $ rev $ parse $ hash:wp-includes / version.php)

如果失败,则意味着文件被删除。我们可以选择忽略它并通过添加 ||来跳过此提交继续,或者以 ||结束break ,或者我们可以完全忽略这个可能性,因为假定文件将存在于每次提交中。由于最后是最简单的,我会在这里做。



如果这个散列匹配 $ haveblob ,它不是很有意思。如果它匹配 $ wantblob ,这非常有趣。如果完全是另外一件事,那么让我们把它说出来。因此,循环的其余部分是:

  test $ thisblob = $ haveblob&&如果[$ thisblob = $ wantblob];继续
;那么
echo目标文件出现在提交$ hash中
exit 0#我们发现它 - 成功并退出
fi
echonote:commit $ hash contains a different版本比任何一端

,这就是顶部的脚本(主要是) p>

更复杂的案例引入了更多的注意事项



该图在内部可能相当分枝; R 甚至可以是合并提交:

  M --- -  
/ \
...-- LR < - 分支
\ /
O - P - Q

或之后:

  M  -  N 
/ \
...-- L Q - R < - 分支
\ /
O - P

或者,图表可以是 L R 是非常不同的:

  ...-- o  -  o  -  o  -  L  -  o  -  o < ;  -  branch1 
\
o --...-- o - R - o< - branch2

或者(如果有多个root提交),它们甚至可以完全不相关,如图所示:

  A  -  B  -  L < -  br1 

C - D - R < - br2

或者,它们可能是相关的,不管它是否是简单的线性关系,而是向后


$ b

  ...  -  o  -  R  -  E  -  F  -  G  -  L  -  o --...  -  o < -   -  branch 

如果两次提交 像这样倒退,您应该简单地交换它们。 (脚本可以这样做: git merge-base --is-ancestor AB 测试commit A 是否是一个祖先) $ b

如果它们不直接相关,那么 L。 .R 语法将排除可从 L 访问的提交,同时列出可从 R 访问的提交。如果它们完全不相关,那么从 R 可访问的提交无法从 L 访问,所以这只是所有提交历史最多 R 。无论哪种情况,您都可能找到答案,也可能找不到答案。

您可以使用上面的git merge-base :如果两个祖先都不是另一个祖先,它们可能通过共同的第三祖先 - 两个提交的实际合并基 - 或者他们可能完全不相关。



如果 L R之间有分支 以便在 R 之前或之前有一个合并,遍历可能以一些难以预测的顺序发生。为了强制Git以拓扑排序顺序枚举提交,我在实际脚本中使用 - topo-order 。这迫使Git每次遍历合并的每条腿。这不一定非常重要,但它使得脚本输出的推理变得更加容易。


When we make a git diff Version1..Version2 -- file, this command will return something like :

diff --git a/wp-includes/version.php b/wp-includes/version.php index 5d034bb9d8..617021e8d9 100644

The git here compare between two version of a file to give you the difference between them. I need to know the commit responsible for adding the file in question from the number of index 5d034bb9d8, and the index **617021e8d9*.

解决方案

TL;DR

This (untested) script may do what you want. Read the rest for how it works, if and when it works, and caveats.

#! /bin/sh
case $# in
2);;
*) echo "usage: script left-specifier right-specifier" 1>&2; exit 1;;
esac
# turn arguments into hashes, then ensure they are commits
L=$(git rev-parse "$1") || exit
R=$(git rev-parse "$2") || exit
L=$(git rev-parse $L^{commit}) || exit
R=$(git rev-parse $R^{commit}) || exit

haveblob=$(git rev-parse $L:wp-includes/version.php) || exit
wantblob=$(git rev-parse $R:wp-includes/version.php) || exit
git rev-list --reverse --topo-order $R ^$L^@ | while read hash; do
    thisblob=$(git rev-parse $hash:wp-includes/version.php)
    test $thisblob = $haveblob && continue
    if [ $thisblob = $wantblob ]; then
        echo "target file appears in commit $hash"
        exit 0 # we've found it - succeed and quit
    fi
    echo "note: commit $hash contains a different version than either end"
done
echo "error: got to the bottom of the loop"
exit 1

Long

Let's clarify this a bit more: you've run:

$ git diff <commit1> <commit2> -- wp-includes/version.php

and its output reads, in part:

index 5d034bb9d8..617021e8d9 100644

Let's call <commit1>—which you specified by hash or tag or branch name or whatever—L, where L stands for left side of git diff. Let's call the second commit R, for the right side.

You want to find some commit that comes at or after L, and before or at R, where file wp-includes/version.php matches the version in R, i.e., the one whose abbreviated hash is 617021e8d9. But you don't want just any commit: you want the first such commit—the one closest to L.

It's worth noting, first, that there may be no sensible relationship at all between the two commits. That is, if we were to draw a graph of the commit history, it might be simple:

...--o--o--L--M--N--...--Q--R--o--o--o   <-- branch

But it might not be so simple. For the moment, let's assume that it is simple.

The simple case: L is L and R is R and there's a straight line of commits in between

In this case, there's some direct causal relationship in getting from L to R. The answer to your question will make a lot of sense. Specifically, it answers the question: where did this version come from? There's a direct line of commits starting at L and ending at R and the version that's in R might be in an earlier commit too. Let's see how to find the earliest commit, in the L-to-R sequence, that has the same version that's in R.

First, note that each commit represents a complete snapshot of all the files that are in that snapshot. That is, if we look at commit N above, it has all the files, in some form or another. The copy of wp-includes/version.php in N might match the one in L or might match the one in R. (It clearly cannot match both: if it did, the one in L would match the one in R and there would be no index line and no diff output.)

It's possible that the file is in L and R but is not in any of the commits in between, but in that case, the answer is: The file first appears in R.

It's also possible that the file is in L and R and in some, but not all, of the intermediate commits: say L has it, then it's removed in M, then it appears again in N in the form it has in R, then it's removed again in O, and so on. So it's present in L, N, P, and R; it's missing in M, O, and Q. Now the question is more difficult: do you want to see it in N, even though it's gone again in O? Or do you want to see it only in R since it's missing in Q?

In any case, what we need to do is enumerate all the commits in the range L through R. So we'll start with:

git rev-list L..R

(which will omit L, which is kind of annoying). Git will enumerate these in a reverse-ish order; since we know the chain is linear, this is in fact straight reverse order. (We'll see how to enforce a sensible order for more complex cases later.) To check L itself as well, we can just add it explicitly:

(git rev-list L..R; git rev-parse L)

or we can use the rather complicated trick of:

lhash=$(git rev-parse L); git rev-list R ^${lhash}^@

(for details see the gitrevisions documentation). The simpler:

git rev-list L^..R

usually works as well: it fails only when L is a root commit.

In any case, the output of git rev-list is a bunch of commit hash IDs: the hash ID of commit R, then that of commit Q, then that of commit P, and so on, all the way back to L. So we'll pipe the output of this git rev-list through commands to figure out where our particular blob came from. But we want to visit the commits in the other order: L first, then M, then N, all the way up to R. So we add --reverse to the git rev-list arguments.

The rest of this assumes we're writing this script in sh or bash or similar. Before we run git rev-list, let's get the full blob-hash of each version of the file. Then we'll have them in the loop:

#! /bin/sh
case $# in
2);;
*) echo "usage: script left-specifier right-specifier" 1>&2; exit 1;;
esac
# turn arguments into hashes, then ensure they are commits
L=$(git rev-parse "$1") || exit
R=$(git rev-parse "$2") || exit
L=$(git rev-parse $L^{commit}) || exit
R=$(git rev-parse $R^{commit}) || exit

# get the blob hashes, exit if they don't exist
haveblob=$(git rev-parse $L:wp-includes/version.php) || exit
wantblob=$(git rev-parse $R:wp-includes/version.php) || exit
git rev-list --reverse $R ^$L^@ | while read hash; do
    ...
done

Inside the loop, let's get the blob hash for this commit:

    thisblob=$(git rev-parse $hash:wp-includes/version.php)

If this fails, that means the file is removed. We can choose to ignore that and skip this commit, by adding || continue, or stop with || break, or we can simply ignore the possibility entirely on the assumption that the file will exist in each commit. Since the last is the simplest, I will do that here.

If this hash matches $haveblob, it's not very interesting. If it matches $wantblob, it's very interesting. If it's something else entirely, well, let's call that out. So the remainder of the loop is:

    test $thisblob = $haveblob && continue
    if [ $thisblob = $wantblob ]; then
        echo "target file appears in commit $hash"
        exit 0 # we've found it - succeed and quit
    fi
    echo "note: commit $hash contains a different version than either end"

and that's the script in the top section (well, mostly).

More complex cases introduce more caveats

The graph could be rather branch-y internally; R could even be a merge commit:

       M-----N
      /       \
...--L         R   <-- branch
      \       /
       O--P--Q

or come after one:

       M--N
      /    \
...--L      Q--R   <-- branch
      \    /
       O--P

Or, the graph could be such that L and R are wildly different:

...--o--o--o--L--o--o   <-- branch1
      \
       o--...--o--R--o   <-- branch2

or (if there are multiple root commits) they could even be completely unrelated, graph-wise:

A--B--L   <-- br1

C--D--R   <-- br2

Or, they might be related, whether or not it's a simple linear relationship, but backwards:

...--o--R--E--F--G--L--o--...--o   <-- branch

If the two commits are backwards like this, you should simply swap them. (The script could do this: git merge-base --is-ancestor A B tests whether commit A is an ancestor of commit B.)

If they're not directly related, the L..R syntax will exclude commits reachable from L while listing commits reachable from R. If they're completely unrelated, commits reachable from R are unreachable from L, so this is just "all commits in the history up to R". In either case, you may or may not find an answer, and it may or may not make any sense.

You can test for these cases with git merge-base above: if neither is an ancestor of the other, they may be related through a common third ancestor—the actual merge base of the two commits—or they may be completely unrelated.

If there are branches "between" L and R so that there is a merge at or before R, the traversal may occur in some difficult-to-predict order. To force Git to enumerate the commits in a topologically-sorted order, I use --topo-order in the actual script. This forces Git to traverse each "leg" of a merge one at a time. That's not necessarily critical here, but it makes reasoning about the script's output easier.

这篇关于如何通过添加文件索引来查找责任(blob)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆