为什么大文件仍然存在于我的包文件中,在用filter-branch擦除之后呢? [英] Why do large files still exist in my packfile, after scrubbing them with filter-branch?

查看:107
本文介绍了为什么大文件仍然存在于我的包文件中,在用filter-branch擦除之后呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我改写了我的存储库的历史记录,使用 git filter-branch 删除一些大型FLV文件。我主要关注了删除敏感数据的Github文章文章以及其他地方的类似说明互联网:



删除大型FLV:

  git filter-branch --index-filter'git rm --cached --ignore-unmatch public / video / *。flv'--prune-empty  -  --all 

删除原始文件:

  rm -rf .git / refs /原始/ 

清除引荐日志:

  git reflog expire --expire = now --all 

修剪无法访问的对象:

  git gc --prune = now 

Aggressivly 修剪无法访问的对象:

  git gc --aggressive --prune = now 

重新包装物品:

  git repack -A -d 

我的gitdir仍然是205 MB,几乎完全包含在一个packfile中:

  $ du -h .git / objects / pack / * 
284K .git / objects / pack / pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.idx
204M .git / objects / pack / pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.pack
code>

使用这个脚本,我可以看到我删除的FLV仍包含在包中:

 所有尺寸均以kB为单位。包列是包文件内压缩的对象的大小。 
size pack SHA位置
17503 17416 1be4132fa8d91e6ce5c45caaa2757b7ea87d87b0 public / video / XXX_FINAL.flv
17348 17261 b7aa83e187112a9cfaccae9206fc356798213c06 public / video / YYY_FINAL.flv
....
code>

通过克隆版本库git clone --bare my-repo yield my-repo.git 这也是205MB的大小。



我可以做些什么来删除这些(大概)未解决的如果他们从来没有犯过错误,那么我们的存储库中的对象会缩回到这个大小。如果他们仍然以某种方式被引用,是否有方法可以告诉哪里?

更新

在试图重新运行 git filter-branch 时,我收到了以下通知:

 无法创建新的备份。 
以前的备份已经存在于refs / original /
中强制使用-f


$ b $覆盖备份b

我证实在 .git / refs / original 中有 no ref,实际上该目录根本不存在。有没有其他的方式,我不知道git store refs?

解决方案

在克隆一个新的副本我可以像上面一样运行命令 ,并获得所需的结果:我的.git目录从205 MB减少到20 MB,并且大型FLV文件从packfile。

第一次尝试也是在我没有做任何修改的全新克隆上执行的,所以我没有满意的解释为什么FLV文件继续存在我最初提交了下面的答案,认为我通过运行 git repack -a 在删除 .git / refs / original 之前,导致原始文件被打包,这样当我删除 .git / refs / original 没有任何影响;我的原始参考文献仍然会引用大型的FLV文件。但是,这似乎并不成立。在 git filter-branch 之后立即添加 git repack -a ,在新版本库上运行上述命令c>似乎不会影响结果 - FLV文件仍然从packfile中清除。我没有理由相信这与原来的问题有关。







有没有其他方式,git存储引用,我不知道?


。事实证明,我并不完全忠于上面列出的命令的顺序。我在运行 rm -rf .git / refs / original git repack -a >,并且Git已经将裁判打包(待定),现在正在试验)。当我运行 rm -rf .git / refs / original 时,没有任何东西被删除。 git gc 无法缩小我的包文件,因为由于打包了参考文件, 仍然存在对旧文件的延迟引用/ original refs。


I've rewritten the history of my repository to remove some large FLV files using git filter-branch. I primarily followed the Github article article on removing sensitive data and similar instructions found elsewhere on the Internet:

Removing the large FLVs:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch public/video/*.flv' --prune-empty -- --all

Removing the original refs:

rm -rf .git/refs/original/

Clearing the reflog:

git reflog expire --expire=now --all

Pruning unreachable objects:

git gc --prune=now

Aggressivly pruning unreachable objects:

git gc --aggressive --prune=now

Repacking things:

git repack -A -d

And my gitdir is still 205 MB, contained almost entirely in a single packfile:

$ du -h .git/objects/pack/*
284K    .git/objects/pack/pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.idx
204M    .git/objects/pack/pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.pack

Using this script, I can see that the FLVs I've removed are still contained in the pack:

All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
17503  17416  1be4132fa8d91e6ce5c45caaa2757b7ea87d87b0  public/video/XXX_FINAL.flv
17348  17261  b7aa83e187112a9cfaccae9206fc356798213c06  public/video/YYY_FINAL.flv
....

Cloning the repository via git clone --bare my-repo yields my-repo.git which is also 205MB in size.

What can I do to remove these (presumably) unreferenced objects from the pack and shrink my repository back to size it would be if they'd never been committed? If they are still referenced somehow, is there a way to tell where?

Update

Upon attempting to re-run git filter-branch, I received this notice:

Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f

I verified that there were no refs in .git/refs/original, indeed, the directory didn't exist at all. Is there some other way that git stores refs, that I don't know about?

解决方案

Upon cloning a fresh copy of the repository, I was able to run the commands exactly as above, and achieve the desired result: My .git directory was reduced from 205 MB down to 20 MB, and the large FLV files were removed cleanly from the packfile.

The first attempt was also performed on a fresh clone to which I had made no modifications, so I do not have a satisfying explanation for why the FLV files continued to linger inside the packfile.

I originally submitted the below answer, thinking that I'd caused a problem by running git repack -a before removing .git/refs/original, causing the original refs to become packed so that when I did remove .git/refs/original there was no effect; my original refs would still be referencing the large FLV files. This doesn't seem to hold up, however. Running the above commands on a freshly cloned copy of the repository with the addition of git repack -a immediately after git filter-branch doesn't seem to affect the outcome - the FLV files are still purged from the packfile. I have no reason to believe this is relevant to the original problem.


Is there some other way that git stores refs, that I don't know about?

There is. It turns out I wasn't entirely truthful about the order of commands as listed above. I had run git repack -a before running rm -rf .git/refs/original, and Git had packed the refs away (to be determined where; experimenting now). When I then ran rm -rf .git/refs/original, nothing was removed. git gc was unable to shrink my packfile because I did still having lingering references to the old files due to the packed refs/original refs.

这篇关于为什么大文件仍然存在于我的包文件中,在用filter-branch擦除之后呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆