重写git历史记录以修改文件 [英] Rewrite git history to modify a file

查看:152
本文介绍了重写git历史记录以修改文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要从所有git历史记录中删除不需要的大文件,可以使用filter-branch重写每次提交的索引(存储库中的文件列表),这样就不会添加文件.

To remove a large unwanted file from all of git history you can use filter-branch to rewrite the index (the list of files in the repo) of each commit so the file was never added.

git filter-branch --index-filter "git rm --cached --ignore-unmatch path/to/offending_file.wav" --tag-name-filter cat -- --all

但是如果我想保留文件但将其缩小很多怎么办(例如,假设一个图标被意外存储为大图像).我尝试过这种方法:

But what if I want to keep the file but make it a lot smaller (e.g. imagine if an icon was accidentally stored as a huge image). I tried this approach:

首先将替换文件添加到git的数据库

First add a replacement file to git's database

HASH=`git hash-object -w /tmp/replacement.png`

还要注意我们要替换的文件

Also note the file we want to replace

FILE="path/to/icon.png"

现在按以下方式过滤索引:首先检查该提交时文件是否存在:

Now filter the index as follows: first check the file exists at this commit:

git cat-file -e :"$FILE"

如果是这样,则将其从索引中删除:

If so remove it from the index:

git rm --cached "$FILE"

最后使用相同的文件名添加对我们替换文件的引用.

And finally add a reference to our replacement with the same filename.

git update-index --add --cacheinfo "100644,$HASH,$FILE"

将它们放在一起:

git filter-branch --index-filter "if git cat-file -e :$FILE ; then git rm --cached $FILE ; git update-index --add --cacheinfo 100644,$HASH,$FILE ; fi" --tag-name-filter cat -- --all

这似乎可行,并且不会打印出可怕的错误.但是,无论有多少git gc和prune命令,我都会尝试在存储库中保留原始的blob.即使我将存储库克隆到一个新位置,它仍然存在.

This seems to work and doesn't print any errors that are too scary. However, no matter how many git gc and prune commands I try the original blob still exists in the repository. Even if I clone the repo to a new place it still exists.

我怀疑这是因为远程引用和filter-branch创建的original引用仍然指向旧树,因此仍引用了原始文件.

I suspect it is because the remote refs, and the original refs which filter-branch creates still point to the old tree, so the original file is still referenced.

我确实尝试过使用此类骇客将其全部删除:

I did try removing them all with a hack like this:

for REF in `git show-ref | cut -c 42- | grep original` ; do git update-ref -d $REF ; done

remotes相同,但是blob仍然存在.

And the same for remotes, but the blob is still there.

所以我的问题:

  1. 是否有一种方法可以查看为什么不对Blob进行垃圾收集? IE.图中哪些父母对象指向它?
  2. 是否有一种非骇客的方法来删除originals裁判(也许是遥控器)-包括所有分支和标签?
  3. 还有什么我想念的吗?
  1. Is there a way to see why a blob isn't garbage collected? I.e. which parents objects in the graph point to it?
  2. Is there a non-hacky way to remove the originals refs (and maybe the remotes) - including all branches and tags?
  3. Is there anything else I'm missing?

推荐答案

啊哈,我做到了!我想.

Aha I've done it! I think.

这是额外的步骤.首先,最好在开始时记下所需Blob的哈希,这样您就可以检查其是否存在

Here are the extra steps. First it's a good idea to note the hash of the blob you want at the start so you can check if it exists with

git cat-file -t 949abcd....

好吧,所以我首先清除了reflog,因为它仍然具有对原始克隆的引用:

Ok so first I cleared the reflog, since it still has a reference to the original clone:

git reflog expire --expire=now --all

接下来,我删除了原始远程服务器,因为它仍然具有对原始树的引用.我猜如果您 push 新的哈希值(可能需要强行推入),那么此步骤将是不必要的,并且无论如何该文件最终都将被GC化.

Next I removed the origin remote, since it still has a reference to the original tree. I guess if you push the new hashes (probably need to force push) then this step will be unnecessary and the file should be eventually GCed anyway.

git remote rm origin

接下来,我删除了original引用(由filter-branch创建).我没有发现一个更简单的方法:

Next I removed the original refs (that filter-branch creates). I didn't find a less hacky way:

for REF in `git show-ref | cut -c 42- | grep original` ; do git update-ref -d $REF ; done

最后,进行垃圾收集.我不确定--aggressive是否是必需的,但--prune=now肯定是因为否则,为了安全起见,git gc仅垃圾收集不需要的对象.

Finally, garbage collect. I'm not sure whether --aggressive is required but --prune=now definitely is because otherwise git gc only garbage collects old unwanted objects, for safety.

git gc --aggressive --prune=now

完成所有这些步骤后,git cat-file报告斑点消失了!我还没有尝试过将结果推回原点(在您重新添加结果之后),并且我不确定100%是否需要执行上述步骤中的哪一个,但是到目前为止,该方法似乎仍然有效.

After all these steps git cat-file reports that the blob is gone! I haven't experimented with pushing the result back to origin (after you re-add it), and I'm not 100% sure which of the above steps are necessary, but this seemed to work so far.

这篇关于重写git历史记录以修改文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆