Git中的文件限制(数量和大小)是什么? [英] What are the file limits in Git (number and size)?

查看:115
本文介绍了Git中的文件限制(数量和大小)是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道Git对文件数量和文件大小的限制是什么吗? 解决方案

href =http://osdir.com/ml/git/2009-05/msg00051.html =noreferrer> Linus自己可以帮助你一些其他的限制。


[...] CVS,也就是说,它真的最终面向一个单个文件
一次模型。



这样很好,你可以有一百万个文件,然后只检查
中的几个 - 你永远不会看到其他
999,995文件的影响。

从根本上看,Git
从来没有真正看上去低于整个回购。即使你的
限制了一些东西(比如只查看一部分,或者只有一点点退回
),git最终仍然会关心整个事情,
和带着这些知识。



因此,如果你强迫它将所有内容都看作一个
巨大的资源库,那么git会非常糟糕。我不认为这部分是真正可以修复的,尽管我们可能会改善



是的,那么存在大文件问题。我真的不知道
是如何处理大文件的。我知道,我们吮吸他们。

在我的:Git的限制是每个存储库都必须代表一个连贯的一组文件 ,所有系统本身(不能标记存储库的一部分)。

如果您的系统由自主(但相互依赖)部分组成,则必须使用 submodules



Talljoe的回答所示,限制可以是一个系统(大量的文件),但如果你确实理解Git的本质(关于数据一致性通过它的SHA-1键),你会意识到真正的限制是一个用法:即,你不应该试图将所有东西存储在Git仓库中,除非你已经准备好总是得到或标记一切。对于一些大型项目,这是没有意义的。






有关更深入了解git限制的信息,请参阅 使用大型文件的git

(其中提到 git-lfs :一种将大文件存储在git仓库之外的解决方案。GitHub,2015年4月)
$ b

限制git回购的三个问题:


  • 巨大文件包装文件的xdelta 仅在内存中,对于大型文件不适用)

  • 大量文件,也就是说,每个blob包含一个文件,而慢速git gc一次只能生成一个包文件。 packfiles ,packfile索引效率低下,无法从(大)包文件中检索数据。





更新的主题(2015年2月)说明了 Git仓库的限制因素



来自中央服务器的几个同步克隆是否也会减慢其他用户的其他并发操作?

克隆时服务器没有锁定,因此理论上克隆不会影响其他操作。克隆可以使用大量内存(除非你打开可达性位图功能,否则你应该使用大量的内存)。


git pull 'be slow?


如果我们排除服务器端,您的树的大小是主要因素,但您的25k文件应该没问题(linux有48k文件)。


' git push '?


这不受深入你的回购的历史,或者你的树有多宽,所以应该是快速的..



啊裁判的数量可能会影响 git-推 git-pull

我认为Stefan在这方面比我知道得更好。


' git commit '? ( 参考文献3 。)
' git status '? (在参考文献3中再次缓慢,但我没有看到它。)

(还有 git-add


同样,树的大小。在您的回购股票的规模上,我认为您不需要担心。


有些操作似乎不是日常操作,但如果他们经常被Web前端调用到GitLab / Stash / GitHub等,那么他们可能会成为瓶颈。 (例如' git branch --contains '似乎受到大量分支的严重影响。)

git-blame 在修改文件时可能很慢。



Does anyone know what are the Git limits for number of files and size of files?

解决方案

This message from Linus himself can help you with some other limits

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even see the impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one huge repository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.

See more in my other answer: the limit with Git is that each repository must represent a "coherent set of files", the "all system" in itself (you can not tag "part of a repository").
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

As illustrated by Talljoe's answer, the limit can be a system one (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true "limit" is a usage one: i.e, you should not try to store everything in a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.


For a more in-depth look at git limits, see "git with large files"
(which mentions git-lfs: a solution to store large files outside the git repo. GitHub, April 2015)

The three issues that limits a git repo:

  • huge files (the xdelta for packfile is in memory only, which isn't good with large files)
  • huge number of files, which means, one file per blob, and slow git gc to generate one packfile at a time.
  • huge packfiles, with a packfile index inefficient to retrieve data from the (huge) packfile.

A more recent thread (Feb. 2015) illustrates the limiting factors for a Git repo:

Will a few simultaneous clones from the central server also slow down other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does not affect other operations. Cloning can use lots of memory though (and a lot of cpu unless you turn on reachability bitmap feature, which you should).

Will 'git pull' be slow?

If we exclude the server side, the size of your tree is the main factor, but your 25k files should be fine (linux has 48k files).

'git push'?

This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick..

Ah the number of refs may affect both git-push and git-pull.
I think Stefan knows better than I in this area.

'git commit'? (It is listed as slow in reference 3.) 'git status'? (Slow again in reference 3 though I don't see it.)
(also git-add)

Again, the size of your tree. At your repo's size, I don't think you need to worry about it.

Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.)

git-blame could be slow when a file is modified a lot.

这篇关于Git中的文件限制(数量和大小)是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆