从git存储库中删除文件(历史记录) [英] Remove file from git repository (history)
问题描述
(解决,请参阅问题主体的底部)
现在寻找这个很长一段时间,我到现在为止是:
- http: //dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
和 - http://progit.org/book/ch9-7.html
几乎相同的方法,但是它们都将对象留在包文件中......卡住了。
我试过的东西:
git filter-branch --index-filter'git rm --cached --ignore -unmatch file_name'
rm -Rf。 git / refs / original $ b $ r rf -Rf .git / logs /
git gc
<
git verify-pack -v .git / objects / pack / pack-3f8c0 ... bb.idx | sort -k 3 -n |尾巴-3
这个:
git filter-branch --index-filtergit rm -rf --cached --ignore -unmatch file_nameHEAD
rm -rf .git / refs / original /& &安培; git reflog expire --all&& git gc --aggressive --prune
同样...
尝试了 git clone
技巧,它删除了一些文件(~3000个文件),但最大的文件仍然存在......
我在存储库中有一些大的遗留文件,大约200M,我真的不希望它们在那里......而且我不想将存储库重置为0: (
解决方案:
这是摆脱文件的最短途径:
- 检查.git / packed-refs - 我的问题是我有一个远程仓库的
refs / remotes / origin / master
行,删除它,否则git不会删除这些文件。 - (可选)
git verify-pack -v .git / objects / pack /# {pack-name} .idx | sort -k 3 -n | tail -5
- 检查最大文件
- (可选)
git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98
- 检查这些文件是什么
git filter-branch --index-filter'git rm --cached --ignore-unmatch file_names'
- 从所有修订中移除文件 - (可选)
-
rm -rf .git / refs / original /
- 删除git的备份
git reflog expire --all --expire ='0 days' - 将所有松散对象过期
git fsck - full --unreachable
- 检查是否有任何松散对象
git repack -A -d
- 重新包装
git prune
- 最终删除这些对象
我无法确定无法访问您的存储库数据,但我相信可能有一个或多个打包引用依然引用您在运行之前的旧提交 git filter-branch
。这可以解释为什么 git fsck --full --unreachable
不会将大对象称为不可访问的对象,即使您已经过期了您的reflog并删除了原始对象(解压缩)ref。
下面是我要做的(在 git filter-branch
和 git gc
已完成):
1)确保原始参考文件不存在:
rm -rf .git / refs / original
2)过期所有reflog条目:
3)检查旧包装文件 很棘手,取决于你有多少打包裁判。我不知道任何Git命令会自动执行此操作,因此我认为您必须手动执行此操作。备份 清理 如果可行,并且 4)重新打包打包的压缩文件( git reflog expire --all --expire ='0 days'$ c
)现在报告您的大块为无法访问,您可以继续下一步。 s)
$ b
.git / packed-refs
。现在编辑 .git / packed-refs
。检查旧的refs(特别是,看它是否打包了任何来自 .git / refs / original
的ref)。如果您发现任何不需要在那里的旧的,请删除它们(删除该ref的行)。
packed-refs
file,看看 git fsck
注意到无法访问的对象:
git fsck --full --unreachable
git
git repack -A -d
这将确保无法访问的对象解压并解压缩。
5)修剪松散(无法访问)对象
git prune
这应该做到这一点。 Git真的应该有更好的方式来管理打包裁判。也许有一种我不知道的更好的方式。如果没有更好的方法,手动编辑 packed-refs
文件可能是唯一的方法。
(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:
- http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ and
- http://progit.org/book/ch9-7.html
Pretty much the same method, but both of them leave objects in pack files... Stuck.
What I tried:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name'
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc
Still have files in the pack, and this is how I know it:
git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3
And this:
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD
rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune
The same...
Tried git clone
trick, it removed some of the files (~3000 of them) but the largest files are still there...
I have some large legacy files in the repository, ~200M, and I really don't want them there... And I don't want to reset the repository to 0 :(
SOLUTION: This is the shortest way to get rid of the files:
- check .git/packed-refs - my problem was that I had there a
refs/remotes/origin/master
line for a remote repository, delete it, otherwise git won't remove those files - (optional)
git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5
- to check for the largest files - (optional)
git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98
- to check what are those files git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'
- to remove a file from all revisionsrm -rf .git/refs/original/
- to remove git's backupgit reflog expire --all --expire='0 days'
- to expire all the loose objectsgit fsck --full --unreachable
- to check if there are any loose objectsgit repack -A -d
- repackinggit prune
- to finally remove those objects
I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch
. This would explain why git fsck --full --unreachable
doesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.
Here's what I'd do (after git filter-branch
and git gc
have been done):
1) Make sure original refs are gone:
rm -rf .git/refs/original
2) Expire all reflog entries:
git reflog expire --all --expire='0 days'
3) Check for old packed refs
This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs
. Now edit .git/packed-refs
. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original
). If you find any old ones that don't need to be there, delete them (remove the line for that ref).
After you finish cleaning up the packed-refs
file, see if git fsck
notices the unreachable objects:
git fsck --full --unreachable
If that worked, and git fsck
now reports your large blob as unreachable, you can move on to the next step.
4) Repack your packed archive(s)
git repack -A -d
This will ensure that the unreachable objects get unpacked and stay unpacked.
5) Prune loose (unreachable) objects
git prune
And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refs
file might be the only way to go.
这篇关于从git存储库中删除文件(历史记录)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!