从 git 存储库中删除文件(历史) [英] Remove file from git repository (history)

查看:18
本文介绍了从 git 存储库中删除文件(历史)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(已解决,见题体底部)
找了很久了,到现在为止:

(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:

几乎相同的方法,但它们都将对象保留在包文件中......卡住了.
我试过的:

Pretty much the same method, but both of them leave objects in pack files... Stuck.
What I tried:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name'
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc

包里还有文件,我是这样知道的:

Still have files in the pack, and this is how I know it:

git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3

还有这个:

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

同样的...

尝试了 git clone 技巧,它删除了一些文件(大约 3000 个),但最大的文件仍然存在......

Tried git clone trick, it removed some of the files (~3000 of them) but the largest files are still there...

我在存储库中有一些大型遗留文件,大约 200M,我真的不希望它们在那里...而且我不想将存储库重置为 0 :(

I have some large legacy files in the repository, ~200M, and I really don't want them there... And I don't want to reset the repository to 0 :(

解决方案:这是删除文件的最短方法:

SOLUTION: This is the shortest way to get rid of the files:

  1. 检查 .git/packed-refs - 我的问题是我有一个用于远程存储库的 refs/remotes/origin/master 行,删除它,否则 git 不会删除这些文件
  2. (可选) git verify-pack -v .git/objects/pack/#{pack-name}.idx |排序 -k 3 -n |tail -5 - 检查最大的文件
  3. (可选) git rev-list --objects --all |grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98 - 检查这些文件是什么
  4. git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names' - 从所有修订版中删除文件
  5. rm -rf .git/refs/original/ - 删除 git 的备份
  6. git reflog expire --all --expire='0 days' - 使所有松散对象过期
  7. git fsck --full --unreachable - 检查是否有任何松散的对象
  8. git repack -A -d - 重新打包
  9. git prune - 最后删除这些对象
  1. check .git/packed-refs - my problem was that I had there a refs/remotes/origin/master line for a remote repository, delete it, otherwise git won't remove those files
  2. (optional) git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5 - to check for the largest files
  3. (optional) git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98 - to check what are those files
  4. git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names' - to remove a file from all revisions
  5. rm -rf .git/refs/original/ - to remove git's backup
  6. git reflog expire --all --expire='0 days' - to expire all the loose objects
  7. git fsck --full --unreachable - to check if there are any loose objects
  8. git repack -A -d - repacking
  9. git prune - to finally remove those objects

推荐答案

如果不能访问您的存储库数据,我不能肯定地说,但我相信可能有一个或多个打包的引用仍在引用您运行之前的旧提交git 过滤器分支.这将解释为什么 git fsck --full --unreachable 不会将大 blob 称为无法访问的对象,即使您的 reflog 已过期并删除了原始(解压)refs.

I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch. This would explain why git fsck --full --unreachable doesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.

这是我要做的(在 git filter-branchgit gc 完成之后):

Here's what I'd do (after git filter-branch and git gc have been done):

1) 确保原始引用已经消失:

rm -rf .git/refs/original

2) 使所有 reflog 条目过期:

git reflog expire --all --expire='0 days'

3) 检查旧的打包参考

这可能会很棘手,具体取决于您拥有多少个打包的 ref.我不知道有任何 Git 命令可以自动执行此操作,因此我认为您必须手动执行此操作.备份 .git/packed-refs.现在编辑 .git/packed-refs.检查旧的 refs(特别是,看看它是否打包了来自 .git/refs/original 的任何 refs).如果您发现任何不需要的旧文件,请删除它们(删除该引用的行).

This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs. Now edit .git/packed-refs. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original). If you find any old ones that don't need to be there, delete them (remove the line for that ref).

清理完 packed-refs 文件后,查看 git fsck 是否注意到无法访问的对象:

After you finish cleaning up the packed-refs file, see if git fsck notices the unreachable objects:

git fsck --full --unreachable

如果成功,并且 git fsck 现在报告您的大 blob 无法访问,您可以继续下一步.

If that worked, and git fsck now reports your large blob as unreachable, you can move on to the next step.

4) 重新打包打包的存档

git repack -A -d

这将确保无法访问的对象被解包并保持解包.

This will ensure that the unreachable objects get unpacked and stay unpacked.

5) 修剪松散(无法访问)的对象

git prune

那应该可以.Git 真的应该有一个更好的方法来管理打包的 refs.也许有我不知道的更好的方法.如果没有更好的方法,手动编辑 packed-refs 文件可能是唯一的方法.

And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refs file might be the only way to go.

这篇关于从 git 存储库中删除文件(历史)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆