Git 中的文件限制是多少(数量和大小)? [英] What are the file limits in Git (number and size)?

查看:147
本文介绍了Git 中的文件限制是多少(数量和大小)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有谁知道 Git 对文件数量和文件大小的限制是什么?

解决方案

此消息来自 Linus 本人 可以帮助您解决一些其他限制

<块引用>

[...] CVS,即它实际上最终几乎面向一个文件一次"模型.

这很好,您可以拥有一百万个文件,然后只检查其中一些 - 你甚至永远不会看到另一个的影响999,995 个文件.

吉特基本上从来没有真正关注少于整个回购.即使你稍微限制一下(即只查看一部分,或者查看历史记录回来一点),git 最终仍然总是关心整件事,并随身携带知识.

所以如果你强迫它把所有东西都看成一个整体,git 的伸缩性会非常糟糕巨大 存储库.我不认为那部分真的可以修复,尽管我们可能会改进它.

是的,还有大文件"问题.我真的不知道该怎么办处理大文件.我们很讨厌他们,我知道.

在我的其他答案:Git 的限制是每个存储库必须代表一个相关文件集",本身就是所有系统"(您不能标记存储库的一部分").
如果您的系统由自治(但相互依赖)部分组成,则必须使用 子模块.

Talljoe 的答案所示,限制可以是系统 一(大量文件),但是如果您确实了解 Git 的本质(关于由其 SHA-1 密钥表示的数据一致性),您就会意识到真正的限制"是 用法 一:即,您不应该尝试将所有内容 存储在 Git 存储库中,除非您准备好始终获取或标记所有内容.对于一些大型项目来说,这毫无意义.

<小时>

要更深入地了解 git 限制,请参阅带有大文件的 git"
(其中提到了 git-lfs:一种将大文件存储在git repo.GitHub,2015 年 4 月)

限制 git repo 的三个问题:

  • 大文件(packfile 的 xdelta 仅在内存中,这不是适用于大文件)
  • 大量文件,这意味着每个 blob 一个文件,而 git gc 一次生成一个包文件的速度很慢.
  • 巨大的包文件,使用包文件索引从(巨大的)包文件中检索数据效率低下.
<小时>

最近的一个主题(2015 年 2 月)说明了Git 存储库的限制因素:

<块引用><块引用>

来自中央服务器的几个同步克隆是否也会减慢其他用户的其他并发操作?

克隆时服务器没有锁,所以理论上克隆不会影响其他操作.不过,克隆可以使用大量内存(以及大量 CPU,除非您开启了可达性位图功能,您应该这样做).

<块引用>

'git pull' 会慢吗?

如果我们排除服务器端,你的树的大小是主要因素,但是你的 25k 文件应该没问题(linux 有 48k 文件).

<块引用>

'git push'?

这个不受你的回购历史有多深,或者你的树有多宽的影响,所以应该快..

啊,refs 的数量可能会影响 git-pushgit-pull.
我认为 Stefan 在这方面比我更了解.

<块引用>

'git commit'?(在参考 3.)'git status'?(虽然我没有看到它,但在参考文献 3 中再次慢了下来.)
(也git-add)

再次,你的树的大小.就您的回购规模而言,我认为您无需担心.

<块引用>

有些操作可能看起来不是日常的,但如果 Web 前端频繁调用它们到 GitLab/Stash/GitHub 等,那么它们可能会成为瓶颈.(例如,git branch --contains"似乎受到大量分支的严重不利影响.)

git-blame 当文件被大量修改时可能会很慢.

Does anyone know what are the Git limits for number of files and size of files?

解决方案

This message from Linus himself can help you with some other limits

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even see the impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one huge repository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.

See more in my other answer: the limit with Git is that each repository must represent a "coherent set of files", the "all system" in itself (you can not tag "part of a repository").
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

As illustrated by Talljoe's answer, the limit can be a system one (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true "limit" is a usage one: i.e, you should not try to store everything in a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.


For a more in-depth look at git limits, see "git with large files"
(which mentions git-lfs: a solution to store large files outside the git repo. GitHub, April 2015)

The three issues that limits a git repo:

  • huge files (the xdelta for packfile is in memory only, which isn't good with large files)
  • huge number of files, which means, one file per blob, and slow git gc to generate one packfile at a time.
  • huge packfiles, with a packfile index inefficient to retrieve data from the (huge) packfile.

A more recent thread (Feb. 2015) illustrates the limiting factors for a Git repo:

Will a few simultaneous clones from the central server also slow down other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does not affect other operations. Cloning can use lots of memory though (and a lot of cpu unless you turn on reachability bitmap feature, which you should).

Will 'git pull' be slow?

If we exclude the server side, the size of your tree is the main factor, but your 25k files should be fine (linux has 48k files).

'git push'?

This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick..

Ah the number of refs may affect both git-push and git-pull.
I think Stefan knows better than I in this area.

'git commit'? (It is listed as slow in reference 3.) 'git status'? (Slow again in reference 3 though I don't see it.)
(also git-add)

Again, the size of your tree. At your repo's size, I don't think you need to worry about it.

Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.)

git-blame could be slow when a file is modified a lot.

这篇关于Git 中的文件限制是多少(数量和大小)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆