清理已删除文件的git历史记录,保留重命名的文件历史记录 [英] clean git history of deleted files, keeping renamed files history

查看:168
本文介绍了清理已删除文件的git历史记录,保留重命名的文件历史记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一些文件提取到新的回购站点,保留其历史记录,包括重命名文件

最好和最接近的答案我能找到的是新回购 - 复制 - 使用 git filter-branch --index-filter 来记录历史的当前跟踪文件。它成功保留了现有文件的历史记录,但它不保留重命名文件的历史记录。



(我能找到的另一个答案是使用 git filter -branch --subdirectory-filter 。但它有两个问题:似乎不适用于整个回购(文件夹'。'),并且不保留重命名文件的历史。)



(另一个答案是使用 git subtree 。但它并没有保留历史记录。)



所以我可能正在寻找一种方法来改进 git ls-files> keep-these.txt 命令从最接近的答案也列出所有以前的文件名称。也许一个脚本?

解决方案

Git不会存储文件名称更改。



每个提交都存储一个完整的树,例如,可能提交 1234567 ... 有文件 README foo.txt 并提交fedcba9 ...有文件 readme.txt foo 。如果你要求git比较commit 1234567 来提交 fedcba9 README 非常相似 1 readme.txt ,git会说将一个提交转换为另一个的方法是重命名文件。 (如果一个提交是另一个的提交,那么子提交的 git show 会显示重命名,因为 git show git show 时间计算这个变化。)

另一方面,如果第二个 readme 文件太不同了,但 README foo 足够相似, git会说改变 1234567 来实现 fedcba9 的方法是重命名 README foo



关键是git会在你请求时计算比较,而不是早一点。提交之间没有任何内容说重命名某些文件。 Git简单地比较提交并决定然后文件是否足够相似。



出于您的目的,这最终意味着每次提交在你提交到复制或部分复制的序列中,你必须决定保留哪些路径名以及哪些路径名被丢弃。如何实现这一点主要取决于你。 git log 命令确实有一个 - 遵循标志来激活有限数量的重命名检测,因为它从孩子向父母承诺, git blame 会自动尝试做同样的事情;您可以使用这些(一次一个路径名)来创建一个映射表:

  in:commitits A ..B C..D E..F 
使用路径:dir / file.ext dir / frill.txt lib / frill.next

为例。但是没有什么可以做到这一点,并且它不会特别容易。我首先将 git log --follow - raw 结合起来 - name-status 输出并查看是否有任何感兴趣的Renames被检测到。如果存在的话,那些是你想要改变你保留的那些路径的提交边界,并且在你通过提交进行工作时丢弃(不管是用 filter-branch git diff --name-> 状态之间的各种提交对(提交对信息来自 git rev-list )。



b
$ b

1 只要您要求重新命名检测,完全相同就足够类似,低至50 %类似。您可以使用您提供给 git diff -M 标志的可选值调整所需的相似性。






编辑:这似乎正常。我在git自己的 builtin / var.c 上使用过,它曾经有两个以前的名字根据这个:

  $ git log --follow --raw --diff-filter = R --pretty = format:%H builtin / var.c 
81b50f3ce40bfdd66e5d967bf82be001039a9a98
:100644 100644 2280518 ... 2280518 ... R100 builtin-var.c builtin / var.c

55b6745d633b9501576eb02183da0b0fb1cee964
:100644 100644 d9892f8 ... 2280518 ... R096 var.c builtin- var.c

- diff-filter 压缩一切,但重命名输出,以便我们看到哪个提交似乎重命名文件。把它变成更有用的东西需要做更多的工作,但是这可能会让你相当远:

  git log --follow  - -raw --diff-filter = R --pretty = format:%H builtin / var.c | 
而真;做
如果!读哈希;然后打破; fi
IFS = $'\ t'读取mode_etc oldname newname
读取空白行
echo $ in $ hash,将$ oldname重命名为$ newname
完成

其中产生:

  in 81b50f3ce40bfdd66e5d967bf82be001039a9a98,在55b6745d633b9501576eb02183da0b0fb1cee964中将builtin-var.c重命名为builtin / var.c 
,将var.c重命名为builtin-var.c


I'd like to extract some files to a new repo, keeping their history, including files renaming.

Best and closest answer I could find was new-repo-with-copied-history-of-only-currently-tracked-files, using git filter-branch --index-filter. It successfully keeps history of existing files, but it doesn't preserve history of renamed files.

(Another answer I could find was using git filter-branch --subdirectory-filter. But it has two issues: doesn't seem to work for the whole repo (folder '.') and doesn't preserve history of renamed files.)

(Yet another answer was using git subtree. But it doesn't keep history at all.)

So I'm probably looking for a way to improve the git ls-files > keep-these.txt command from closest answer to also list all previous file names. Maybe a script?

解决方案

Git doesn't store file name changes.

Each commit stores a complete tree, e.g., perhaps commit 1234567... has files README and foo.txt and commit fedcba9... has files readme.txt and foo. If you ask git to compare commit 1234567 to commit fedcba9, and README is sufficiently similar1 to readme.txt, git will say that the way to transform the one commit to the other is to rename the file. (If the one commit is the parent of the other, git show of the child commit will show the rename, because git show computes this change at git show time.)

On the other hand, if the second readme file is too different, but README is sufficiently similar to foo, git will say that the way to change 1234567 to achieve fedcba9 is to rename README to foo.

The key is that git computes that when you ask for the comparison, and not a moment earlier. There's nothing in between the commits that says "rename some files". Git simply compares the commits and decides then whether the files are similar enough.

For your purposes, what this ultimately means is that for each commit in your sequence-of-commits-to-copy-or-partially-copy, you'll have to decide which path names to keep and which to discard. How to achieve that is mostly up to you. The git log command does have a --follow flag to activate a limited amount of rename detection as it works backwards from child commits to their parents, and git blame automatically tries to do the same; you could use these (one path name at a time) to come up with a mapping of the form:

      in:   commits A..B    C..D             E..F
use path:   dir/file.ext    dir/frill.txt    lib/frill.next

for instance. But there's nothing built in to do this, and it won't be particularly easy. I'd start by combining git log --follow with --raw or --name-status output and seeing if there are any interesting Renames detected. If and when there are, those are the commit boundaries at which you'll want to change which paths you're keeping and discarding as you work through commits (whether that's with filter-branch or some other method).

If that doesn't work, or you need more control, consider running git diff --name-status between various commit pairs (with commit pair info coming from git rev-list).


1As long as you've asked for rename detection, "exactly the same" is sufficiently similar, as is anything down to about "50% similar". You can tweak the required similarity with the optional value you supply to git diff's -M flag.


Edit: this seems to work OK. I used it on git's own builtin/var.c, which used to have two previous names according to this:

$ git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c
81b50f3ce40bfdd66e5d967bf82be001039a9a98
:100644 100644 2280518... 2280518... R100       builtin-var.c   builtin/var.c

55b6745d633b9501576eb02183da0b0fb1cee964
:100644 100644 d9892f8... 2280518... R096       var.c   builtin-var.c

The --diff-filter suppresses everything but rename outputs so that we get to see which commit seems to rename the file. Turning this into something more useful requires a bit more work, but this might get you fairly far:

git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c |
while true; do
    if ! read hash; then break; fi
    IFS=$'\t' read mode_etc oldname newname
    read blankline
    echo in $hash, rename $oldname to $newname
done

which produced:

in 81b50f3ce40bfdd66e5d967bf82be001039a9a98, rename builtin-var.c to builtin/var.c
in 55b6745d633b9501576eb02183da0b0fb1cee964, rename var.c to builtin-var.c

这篇关于清理已删除文件的git历史记录,保留重命名的文件历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆