Git - 存储库和文件大小限制 [英] Git - repository and file size limits

查看:225
本文介绍了Git - 存储库和文件大小限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过各种互联网资源,Git处理大文件的效果不是很好,Git似乎也遇到了大型整体存储库大小的问题。这似乎已经启动了像git-annex,git-media,git-fat,git-bigfiles这样的项目,甚至可能还有更多...

然而,在阅读< git-Internals> Git-Internals 它看起来像我的Git包文件概念应该可以解决大文件的所有问题。



Q1: Git中的大文件有什么大惊小怪?



Q2: Git和大型资料库有什么大惊小怪的?



Q3:如果我们有一个具有两个二进制相关性的项目(例如,大约25个DLL文件,每个大约500KB到1MB)每月更新一次。这真的会成为Git的问题吗?只有最初的克隆会变成一个漫长的过程,或者正在与存储库合作(例如分支变更,提交,拉动,推送等)将会成为日常问题?

解决方案

简而言之,今天的电脑对大文件不好。移动兆字节相当快,但千兆字节需要时间。只有专门的工具可以处理千兆字节的数据,Git不是其中之一。



更多与Git相关的信息:Git始终比较文件。如果文件很小(几KB),那么这些操作很快。如果它们很大,那么git必须比较许多字节,这需要时间,内存和神经。



您列出的项目会为大文件添加特殊处理,比如将它们保存在各个blob中,而不用将它们与以前的版本进行比较。这使得每天的操作速度更快,但是以代码库大小为代价。并且Git需要按照回购大小的顺序为某些操作提供可用磁盘空间,否则您将收到错误(并且可能因为此代码容易受到最小测试而损坏回购)。



最后,初始克隆需要很长时间。



关于Q3:Git不是备份工具。您可能不希望能够在十年前获得该DLL。



将这些库的源代码放在Git下,然后使用备份/发行版进程来处理二进制文件(比如在某些网络驱动器上保留最近12个月的值)。

I've read at various internet resources that Git is handling large files not very well, also, Git seems to have problems with large overall repository sizes. This seems to have initiated projects like git-annex, git-media, git-fat, git-bigfiles, and probably even more...

However, after reading Git-Internals it looks to me, like Git's pack file concept should solve all the problems with large files.

Q1: What's the fuss about large files in Git?

Q2: What's the fuss about Git and large repositories?

Q3: If we have a project with two binary dependencies (e.g. around 25 DLL files with each around 500KB to 1MB) which are updated on a monthly basis. Is this really going to be a problem for Git? Is only the initial cloning going to be a long process, or is working with the repository (e.g. branch change, commits, pulling, pushing, etc.) going to be everyday problem?

解决方案

In a nutshell, today's computers are bad with large files. Moving megabytes around is pretty fast but gigabytes take time. Only specialized tools are ready to handle gigabytes of data and Git isn't one of those.

More related to Git: Git compares files all the time. If the files are small (a few KB), then these operations are fast. If they are huge, then git has to compare many, many bytes and that takes time, memory and nerves.

The projects which you list add special handling for large files, like saving them in individual blobs without trying to compare them to previous versions. That makes every day operations faster but at the cost of repository size. And Git needs free disk space in the order of the repo size for some operations or you'll get errors (and maybe a corrupted repo since this code is prone to be tested least).

Lastly, the initial clone will take a long time.

Regarding Q3: Git isn't a backup tool. You probably don't want to be able to get the DLL from ten years ago, ever.

Put the sources for those libraries under Git and then use a backup/release process to handle the binaries (like keeping the last 12 months worth on some network drive).

这篇关于Git - 存储库和文件大小限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆