git filter-branch-放弃一系列提交中对一组文件的更改 [英] git filter-branch - discard the changes to a set of files in a range of commits

查看:81
本文介绍了git filter-branch-放弃一系列提交中对一组文件的更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个分支 dev ,并且我想放弃对在一组提交中对一组文件所做的所有更改 dev 分支,因为它与 master 不同.如果在此范围内的提交仅涉及那些文件,我希望将其删除.我最接近的是:

Say I have a branch dev and I want to discard all the changes made to a set of files in the rage of commits in dev branch since it diverged from master. If a commit in this range only touches those files I'd liked it pruned. The closest I got was :

git checkout dev
git filter-branch --force --tree-filter 'git checkout master -- \
a/b/c.png \
...
' --prune-empty -- master-dev-older-ancestor..HEAD

但这有这些缺点

  1. 如果该文件自从master中删除后,将失败,并出现错误:pathspec'a/b/c.png'与git已知的任何文件都不匹配.我可能决定 git checkout master-dev-older-ancestor ,然后
  2. 此文件可能在master-dev-older-ancestor中不存在,并且稍后又从master合并回 dev
  3. 毕竟,我可能想放弃对某些文件的更改,这些更改在母版中看不到

从根本上讲,我不想告诉git检出文件的特定版本-我想告诉git过滤范围内的所有提交 master-dev-older-ancestor..HEAD 将所有更改保存在任意文件集中(在主文件上或不在上的任意位置) em> .

Fundamentally the point is that I do not want tell git to checkout a specific version of the file - I want to tell git to filter all commits in the range master-dev-older-ancestor..HEAD to have all changes in an arbitrary set of files (present anywhere on master or not) discarded.

那我怎么告诉git?

推荐答案

从根本上说,filter-branch所做的就是这一切-其他一切都是优化和/或边缘情况: 1

Fundamentally, what filter-branch does is this—everything else is optimization and/or edge-cases:1

  • 对于列出的修订中的每个提交:
  • For each commit in the listed revision(s):
  1. 检查该提交;
  2. 应用过滤器;
  3. 根据步骤2创建一个新的提交,该提交可以与旧的提交相同,也可以不相同(即,此新副本是旧提交的修改版本,除非逐位相同,其中这种情况下,创建的新"提交实际上实际上只是旧的提交.

  • 对于命令行上的每个正"引用,请将其重写以指向第3步中进行的新提交,无论它指向第1步中签出的旧提交.
  • 现在让我们考虑一下您希望采取的措施,但是我要强调一个不同的词:

    Now let's consider your desired action, but I'm going to emphasize a different word:

    过滤[a]范围内的所有提交...以使任意文件中的所有更改 ...被丢弃

    我在此强调更改",因为每次提交都是一个完整的独立实体.提交没有更改",它们只有文件.查看更改的唯一方法是将一个特定的提交与另一个特定的提交进行比较:例如 git diff commitA commitB .

    I emphasize "changes" here because each commit is a complete, stand-alone entity. Commits don't have "changes", they just have files. The only way to see changes is to compare one specific commit against another specific commit: git diff commitA commitB for example.

    因此,当您说更改某些文件"时,最明显的问题应该是:更改了哪些内容?

    Thus, when you say "changes to some file(s)", the immediate obvious question should be: changes with respect to what?

    在大多数情况下,谈论提交更改"的人表示此提交相对于其直接祖先的更改":对于简单(非合并)提交,使用可获得的补丁git show git log -p .(通常,他们没有考虑如果提交是合并,因此有多个父级,这意味着什么.对于这些, git show 通常显示针对其所有父级的合并提交的组合差异,但是可能与用户的意图不符;请参阅 git-显示文档以获取详细信息.)

    In most cases, people who talk about "changes in a commit" mean "changes in this commit with respect to its immediate ancestor": for simple (non-merge) commits, the patch you'd get with git show or git log -p. (Usually they have not thought about what they mean if the commit is a merge, and therefore has multiple parents. For these, git show generally shows a combined diff of the merge commit against all its parents, but that may not match the user's intent here; see the git-show documentation for details.)

    使用 git filter-branch 时,您将必须自行定义(关于内容的更改). filter-branch 命令为您提供检出提交的SHA-1 ID(即使它只是在步骤1中虚拟"检出了,而不是实际塞入磁盘树中).在环境变量 $ GIT_COMMIT 中.因此,如果您对相对于什么"的定义是相对于第一代父母",则可以使用

    When using git filter-branch, you will have to define this (changes with respect to what) yourself. The filter-branch command gives you the SHA-1 ID of the checked-out commit—even if it's only "virtually" checked out in step 1, rather than actually stuffed into an on-disk tree—in the environment variable $GIT_COMMIT. So, if your definition of "with respect to what" is "with respect to first parent", you can use gitrevisions syntax to refer to the parent: ${GIT_COMMIT}^ is the first-parent, even when ${GIT_COMMIT} is a raw SHA-1.

    一个非常粗糙且未经优化的-tree-filter 可以简单地提取每个此类文件的父版本,如下所示: 2

    A very crude and un-optimized --tree-filter that simply extracts the parent versions of each such file goes like this:2

    for path in ...list-of-paths...; do
        git checkout -q ${GIT_COMMIT}^ -- $path 2>/dev/null
    done
    exit 0 # in case the last "git checkout" failed, override its status
    

    只是要求git检索文件的父提交版本,而丢弃由于该文件在父版本中不存在而出现的任何错误消息.但这也可能与您的意图不符:如果文件不在父目录中,则是否要删除该文件尚不清楚.此外,如果在您范围内的提交序列中的某个位置添加或删除文件,则仅将每个原始提交与其(单个)原始父提交进行比较可能会触发错误.例如,如果文件 foo 在提交C5中不存在,在C6中确实存在,并且在C7中保持不变,则C7和C6之间的比较将为文件未更改",而C5与之前的比较则为文件未更改".-C6说已添加文件".如果新的(更改的)C6(将其区分为C6')将其删除,因为它不在C5中,因此删除了 foo ,想必您的C7'也应该省略文件 foo

    which simply asks git to retrieve the parent commit's version of the file, discarding any error message that occurs because the file does not exist in the parent version. But this may not match your intent either: it's not clear whether you want to remove the file if it is not in the parent. Moreover, if a file is added or removed somewhere in the sequence of commits in your range, comparing each original commit only to its (single) original parent commit may mis-fire. For instance, if file foo does not exist in commit C5, does exist in C6, and remains unchanged in C7, the comparison between C7 and C6 says "file unchanged" while the earlier comparison of C5-to-C6 says "file added". If your new (altered) C6—let's call it C6' to tell them apart—removes foo because it was not in C5, presumably your C7' should also omit file foo.

    另一种替代方法是将每个提交与(单个)提交 整个范围之前的进行比较.如果您的范围涵盖提交C1,C2,C3,...,C9,我们可以调用单个先前的提交C0.然后,代替将C1与C1 ^,C2与C2 ^等进行比较,我们可以将C1与C0,C2与C0,C3与C0等进行比较.根据您对更改"的定义,这可能正是您想要的,因为撤消更改"可能是可传递的:我们在新C6中删除了 foo ,因此我们必须删除 foo也在我们的新C7中;我们在新的C7中重新添加 bar ,因此我们也必须在新的C8中重新添加.

    Another alternative is to compare each commit to the (single) commit just before the entire range. If your range covers commits C1, C2, C3, ..., C9, we can call the single previous commit C0. Then, instead of comparing C1 to C1^, C2 to C2^, and so on, we can compare C1 to C0, C2 to C0, C3 to C0, and so on. Depending on your definition of "changes", this may be exactly what you want, because "undoing a change" may be transitive: we remove foo in our new C6, therefore we must remove foo in our new C7 as well; we add back bar in the new C7, therefore we must add it back in the new C8 as well, and so on.

    比较脚本的粗略版本是这样的(也可以针对-index-filter 进行优化,尽管我会把工作留给其他人,因为这意味着进行说明):

    A less-crude version of the comparison script goes like this (this can be optimized for --index-filter as well, although I will leave the work up to someone else since this is meant for illustration):

    # Note: I haven't tested this either, not sure how it behaves if
    # used inside git filter-branch.  As a --tree-filter you would not
    # really want to "git rm" anything, just to "rm" it.  As an
    # --index-filter you would want to "git rm --cached".  For
    # checkout, as a tree filter you want to extract the file into
    # the working tree, and as an index filter you want to extract
    # the file into the index.
    git diff --name-status --no-renames $WITH_RESPECT_TO $GIT_COMMIT \
        -- ...paths... |
    while read status path; do
        # note: $path may have embedded white space, so we
        # quote it below to protect it from breaking into words
        case $status in
        A) git rm -- "$path";; # file was added, rm it to undo
        D|M) git checkout $WITH_RESPECT_TO -- "$path";; # deleted or modified
        *) echo "file $path has strange status $status, help!" 1>&2; exit 1;;
        esac
    done
    

    说明:以上假设您正在过滤(可能是线性的,可能是分支y)一系列提交 C1 C2 ,...,Cn .对于某些 C1 提交父对象,您希望它们不更改某些路径集的内容,甚至不存在".您必须在 $ WITH_RESPECT_TO 中设置适当的说明符.(这可以来自环境,也可以硬编码到实际脚本中.请注意,对于您的-index-filter -tree-filter 可以让外壳程序运行脚本,而不是尝试全部按顺序执行.)

    Explanation: the above assumes you're filtering a (maybe linear, maybe branch-y) series of commits C1, C2, ..., Cn. You want them to "not alter the contents or even existence" of some set of paths, with respect to some parent-of-C1 commit. You must set an appropriate specifier into $WITH_RESPECT_TO. (This can come from the environment, or just be hard-coded into an actual script. Note that for your --index-filter or --tree-filter, you can have the shell run a script, rather than trying to do it all in line.)

    例如,如果您要过滤 X..Y ,这意味着标签 Y 可以访问的所有提交,而标签 X 可以访问的所有提交/code>",则 $ WITH_RESPECT_TO 的适当值可能只是 X ,但更有可能是 X 的合并基础和 Y .如果 X Y 是看起来像这样的分支:

    For instance, if you're filtering X..Y, which means "all commits reachable from label Y excluding all commits reachable from label X", it's possible that the appropriate value for $WITH_RESPECT_TO is simply X, but it is more likely the merge-base of X and Y. If X and Y are branches that look something like this:

    ...-o-o-o-o-o-o   <-- master
         \
          *-o-o       <-- X
           \
            o-o-o-o   <-- Y
    

    然后您要过滤底部行上的提交,并且应该过滤的第一个提交应该相对于某些路径保持不变,如在 * 中看到的那样"(我标记为提交带有星号).这就是 git merge-base X Y 的承诺.

    then you're filtering the commits on the bottom row, and the first commit to be filtered should probably be "unchanged with respect to some paths as seen in commit *" (the commit I marked with an asterisk). That's the commit that git merge-base X Y would come up with.

    如果您使用原始SHA-1 ID,则可以使用类似以下内容的东西:

    If you're working with raw SHA-1 IDs, you might be able to use something like:

    WITH_RESPECT_TO=676699a0e0cdfd97521f3524c763222f1c30a094 \
    git filter-branch ... (filter-branch arguments go here) ... --
    676699a0e0cdfd97521f3524c763222f1c30a094..branch
    

    原始SHA-1就是提交 * 的ID.

    where the raw SHA-1 is the ID of commit *, as it were.

    关于 git diff 本身,让我们看一下它产生的输出种类:

    As for the git diff itself, let's look at the sort of output it produces:

    $ git diff --name-status --no-renames \
    >  2cd861672e1021012f40597b9b68cc3a9af62e10 \
    >  7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d
    M       Documentation/RelNotes/1.8.5.4.txt
    A       Documentation/RelNotes/1.8.5.5.txt
    M       Documentation/git.txt
    M       GIT-VERSION-GEN
    M       RelNotes
    

    (这是 git 本身在源树上的 git diff 的实际输出).在这两个修订版之间,修改了一个发行说明文本文件,添加了一个,修改了 Documentation/git.txt 等.现在,让我们再试一次,但将其限制为一个真实的路径名和一个伪造的路径名:

    (this is actual output of git diff on the source tree for git itself). Between those two revisions, one release-notes text file was modified, one was added, Documentation/git.txt was modified, and so on. Now let's try that again but restricting it to one real pathname and one fake one:

    $ git diff --name-status --no-renames \
    >  2cd861672e1021012f40597b9b68cc3a9af62e10 \
    >  7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d \
    >  -- Documentation/RelNotes/1.8.5.5.txt NoSuchFile
    A       Documentation/RelNotes/1.8.5.5.txt
    

    现在,我们找到了一个添加的文件,但是没有关于不存在的文件的投诉.因此可以给出不存在"的路径.它们根本不会出现在输出中.

    Now we find out about the one added file, but there is no complaint about the nonexistent file. So it's OK to give "nonexistent" paths; they simply won't occur in the output.

    如果将提交 $ WITH_RESPECT_TO 与以后的某些提交 C 进行比较,则表示路径 p 已添加到提交 C 中,我们知道它在 $ WITH_RESPECT_TO 中不存在,而在 C ,因此我们希望将其删除,以使其保持不变".(状态字母 A 就是这种情况.)

    If diffing commit $WITH_RESPECT_TO against some later commit C says that path p is added in commit C, we know that it does not exist in $WITH_RESPECT_TO and does in C, so we want to remove it so that it's "unchanged". (This is the case for status-letter A.)

    如果差异表明路径 p 已在 C 中删除,我们知道它是确实存在 ,必须对其进行还原以保持不变".(状态字母 D 就是这种情况.)

    IF the diff says that path p is deleted in C, we know that it does exist in the first, and must be restored to remain "unchanged". (This is the case for status-letter D.)

    如果差异表明两者中都存在路径 p ,但文件内容在 C ,必须还原内容以保持不变".(状态字母 M 就是这种情况.)

    If the diff says that path p exists in both, but the contents of the file differ in C, the contents must be restored to remain "unchanged". (This is the case for status-letter M.)

    其他差异状态字母是 C R T U X B ,但是某些不会发生(通过指定适当的值,我们排除了 C R B git diff 选项; U 仅在不完全合并期间发生;并且 X 绝不应该发生:请参见). T 情况可能导致中止过滤(例如,将常规文件更改为symlink,反之亦然;或者用子模块替换了某些内容).

    Other diff status letters are C, R, T, U, X, and B, but some cannot occur (we exclude C, R, and B by specifying appropriate git diff options; U only occurs during incomplete merges; and X should never occur: see What do the Git "pairing broken" and "unknown" statuses mean, and when do they occur?). The T case is possibly cause to abort the filtering (regular file changed to symlink, or vice versa, for instance; or something replaced with a submodule).

    如果在考虑了一段时间之后,您决定关于" 应该使用父提交,则可以使用 git diff-tree (给定一次提交),将提交树与其父级的树进行比较.(但再次注意,它在合并提交时的行为,并确保这就是您想要的.)

    If, after thinking about the issue for a while, you decide that "with respect to" should use parent commit(s), you can use git diff-tree, which—given a single commit—compares the tree of the commit with those of its parents. (But again, note its behavior on merge commits, and make sure that's what you want.)

    1 当使用-tree-filter 时,它实际上完成了全部检查出的部分.使用-index-filter ,它将提交写入索引,但实际上不写入文件系统,并允许您在索引内进行所有更改.使用-env-filter -msg-filter -parent-filter -commit-filter ,它使您可以更改每次提交的文本,作者和/或父项.-tag-name-filter 使您可以根据需要更改标签名称,并使新名称指向新提交,而不是旧提交(因此-tag-name-filter cat 保留名称不变,并使指向旧提交的名称变为现在指向新提交的名称.

    1 When using --tree-filter, it actually does the full blown check-everything-out part. With --index-filter it writes the commit into the index, but not actually into the file system, and lets you make all the changes within the index. With --env-filter, --msg-filter, --parent-filter, and --commit-filter, it lets you change the text, author, and/or parents of each commit. The --tag-name-filter lets you alter the tag names if needed, and causes the new names to point to the new commits instead of the old ones (hence --tag-name-filter cat leaves the names unchanged and makes those that pointed to the old commits, now point to the new ones).

    -修剪空涵盖了一个极端的情况:如果您具有一连串的提交 C1<-C2<-C3 ,而您的C2'(您的 C2 副本)与您的 C1'具有相同的底层树,将 C2' C1'产生一个空的差异.过滤分支操作通常保留这些内容,但是如果您使用-prune-empty 则将其忽略:新链将为 C1'<-C3'.但是请注意,原始链可能具有空"提交;在这种情况下,即使副本实际上与原始副本相同, filter-branch 也会删减这些副本.

    The --prune-empty covers an edge case: if you have a chain of commits C1 <- C2 <- C3, and your C2' (your copy of C2) has the same underlying tree as your C1', comparing the trees of C2' and C1' produces an empty diff. The filter-branch operation normally keeps these, but omits them if you use --prune-empty: your new chain will then be C1' <- C3'. But note that the original chain may have "empty" commits; in this case, filter-branch will prune those even if the copies are actually the same as the originals.

    2 这些脚本的编写就像在脚本文件中一样.如果将它们变成单行,则需要添加分号,并且可能还需要将 exit 转换为 return ,因为您不希望整个过程退出进行 eval 设置后.

    2 These scripts are written as if in script files. If you turn them into one-liners you will need to add semicolons as necessary, and perhaps also turn exit into return, since you don't want the whole thing to exit when evaled.

    这篇关于git filter-branch-放弃一系列提交中对一组文件的更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆