Bitbucket令人担忧,我的git repo太大,但我无法确认大文件 [英] Bitbucket is alarming that my git repo is too large but I cannot confirm large files

查看:620
本文介绍了Bitbucket令人担忧,我的git repo太大,但我无法确认大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Bitbucket惊动我的Git仓库超过1 GB。实际上,在存储库详细信息页面中,它表示它是 1.7 GB 。太疯狂了。我必须在版本控制中包含大型数据文件。我的本地存储库实际上是 10 GB ,这意味着至少我已经在某种程度上成功地使用了 .gitignore ,以从版本中排除大文件控制。



接下来,我按照这里的教程 https://confluence.atlassian.com/display/BITBUCKET/Reduce+repository+size 并尝试删除未使用的大数据。
$ b $的顶层文件夹中的命令 files.git count-objects -v 返回以下内容:

  count:5149 
size:1339824
in-pack:11352
packs:2
size-pack:183607
prune-packable:0
垃圾:0
size-garbage:0



<大小包 183607 KB 远小于1.7 GB。我有点困惑。



接下来,我下载了 BFG Repo Cleaner https://rtyley.github.io/bfg-repo-cleaner 并运行命令 java -jar bfg-1.12.3.jar --strip-在顶级目录下大于100M 的blob将从所有最新的提交中删除大于100 MB的文件。但是,BFG返回了以下消息:

 警告:没有在包装文件中找到大匹配条件
- 需要打包吗?

重复50M的结果相同。

这是否意味着所有大于50 MB的文件都在最新的提交中?在Bitbucket中的源代码浏览器中,我查看了包含大型数据文件的文件夹,但不包含这些文件(成功忽略)。



任何人都可以简单地解释什么是源代码关于存储库大小和存在的大型文件在回购?

解决方案

在这一点上,你需要看看服务器上的存储库以确定问题是什么,您可能需要与BitBucket技术支持进行交流。但是你的描述听起来好像你的仓库里有一些垃圾可以清理。



考虑你是否已经将一些500 MB文件推送到BitBucket存储库。现在你意识到你的错误,并以某种方式从你的版本库中删除它(例如BFG)并推送更新后的文件。您远程的ref将被更新为指向新的提交,并且您的存储库不会包含大文件(如果您克隆了存储库,则不会获得大文件)。



但是远程程序不会删除旧提交或该提交中的旧文件。它只会将它与图表断开,并且该大文件将不再可达。实际上,它将是垃圾,符合垃圾收集的条件。这将删除大文件,并且服务器上的存储库大小将缩小。



无法向服务器询问GC(通过git协议)。 BitBucket的支持应该能够为您执行此操作:


您需要查找我们改为触发gc。我认为最好的方法就是升级它,如果它真的很紧急,我们应该能够立即得到它。 - Bitbucket支持(2016年12月)
blockquote>

请注意,这假定您实际上在本地拥有完整的存储库,请确保执行 fetch --all 以确保您本地没有(可达)历史记录的子集。如果是BFG,请确保您已使用 - mirror 选项克隆存储库。


Bitbucket is alarming that my Git repository is over 1 GB. Actually, in Repository details page it says it is 1.7 GB. That's crazy. I must have included large data files in the version control. My local repository is in fact 10 GB, which means that at least I have been using .gitignore successfully to some extent to exclude big files from version control.

Next, I followed the tutorial here https://confluence.atlassian.com/display/BITBUCKET/Reduce+repository+size and tried to delete unused large data. The command files.git count-objects -v at the top level folder of my repo returned the following:

count: 5149
size: 1339824
in-pack: 11352
packs: 2
size-pack: 183607
prune-packable: 0
garbage: 0
size-garbage: 0

The size-pack 183607 KB is much smaller than 1.7 GB. I was a bit perplexed.

Next I downloaded the BFG Repo Cleaner https://rtyley.github.io/bfg-repo-cleaner and ran the command java -jar bfg-1.12.3.jar --strip-blobs-bigger-than 100M at the top level directory to remove files bigger than 100 MB from all the not latest commits. However, BFG returned the following message:

Warning : no large blobs matching criteria found in packfiles 
- does the repo need to be packed?

Repeating the same for 50M resulted in the same.

Does this mean that all the files larger than 50 MB are in the latest commit? In Source code browser in Bitbucket, I looked at folders that contain large data files but those files are not included (successfully ignored).

Could anyone explain briefly what is the source of confusion about the repository size and existence of large files in the repo?

解决方案

At this point you would need to look at the repository on the server to know with certainty what the problem is, and you will likely need to talk to BitBucket technical support. But your description makes it sound like your repository has some garbage in it that can be cleaned up.

Consider if you had pushed some 500 MB file up to your BitBucket repository. Now you realize your error, and remove it from your repository in some way (BFG, for example) and push that updated ref. The ref on your remote will be updated to point to the new commit, and your repository will not appear to contain the big file (if you cloned your repository, you would not get the big file).

But the remote would not have gone and deleted the old commit or the old file in that commit. It would merely disconnect it from the graph, and that large file would no longer be "reachable". It would, in fact, be "garbage" eligible for "garbage collection". This would delete the big file and your repository size on the server would shrink.

There is no way to ask the server to GC (over the git protocol). BitBucket's support should be able to perform this for you:

You'll need to look for us to trigger the gc instead. I guess the best way is to "escalate" it if it is really urgent, and we should be able to get to it immediately. — Bitbucket Support (Dec. 2016)

Note that this assumes that you actually have the full repository locally, make sure to do a fetch --all to ensure that you don't have a subset of (reachable) history locally. In case of BFG, make sure you've cloned your repository with the --mirror option.

这篇关于Bitbucket令人担忧,我的git repo太大,但我无法确认大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆