Git如何节省空间,并同时快? [英] How does Git save space and is fast at the same time?

查看:188
本文介绍了Git如何节省空间,并同时快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚看到第一个 Git 教学课程 http://blip.tv/play/Aeu2CAI



Git如何存储所有文件的所有版本,以及如何在 Subversion 只保存最新版本的代码?



我知道这可以使用压缩,但这将是以速度为代价,但这也说Git更快(虽然它获得最大的是最大的事实,它的大部分操作是离线的)。



所以,我的猜测是




  • Git广泛压缩数据

  • 速度更快,因为 uncompression + work 仍然快于 network_fetch + work



正确?即使关闭?

解决方案

我假设你问的是一个git克隆(完整的repository + checkout)比Subversion中的检出来源多。




>

存储库大小



首先你应该考虑到checkout(工作版本)Subversion存储pristine副本在 .svn 子目录中。



其次,git使用以下技术使存储库变小:




  • 每个版本的文件只存储一次;这意味着如果你在10个版本(10个提交)中只有两个不同版本的文件,git只存储这两个版本,而不是10个。

  • 对象(和deltas,存储压缩;在编程压缩中使用的文本文件非常好(约为原始大小的60%,或压缩后大小减少40%)

  • 重新包装后,对象以增量形式存储,一些其他版本;另外git尝试以增量主要包括删除的方式来订购增量链(在增长文件的通常情况下,它是按照新近顺序);


  • 性能(操作速度)



    首先,涉及网络的任何操作将比本地操作慢得多。因此,例如比较工作区域的当前状态与一些其他版本,或获得在Subversion中涉及网络连接和网络传输的日志(历史),并且在Git中是本地操作,当然在Subversion中比Subversion慢得多在Git。 BTW。这是集中式版本控制系统(使用客户端 - 服务器工作流)和分布式版本控制系统(使用点对点工作流)之间的区别,不仅是Subversion和Git。



    其次,如果我理解它,现在的限制不是CPU,而是IO(磁盘访问)。因此,可能由于压缩而必须从磁盘读取较少数据的增益(并且能够在存储器中mmap)克服了必须解压缩数据所造成的损失。



    第三,Git的设计考虑了性能(参见例如 GitHistory 页面在Git Wiki上):




    • 索引存储文件的stat信息,Git使用它来决定如果文件被修改, (参见 core.trustctime config变量)。

    • 最大增量深度限制为 pack。 depth ,默认为50. Git有增量高速缓存以加速访问。有用于快速访问packfile中对象的(生成的)packfile索引。

    • Git注意不要触摸它不需要的文件。例如,当切换分支或倒回到另一个版本时,Git仅更新已更改的文件。这个哲学的后果是Git只支持非常少的关键字扩展(至少开箱即用)。

    • Git使用它的自己的 LibXDiff 库,

    • Git会尽量减少 延迟 ,这意味着良好的感知性能。例如,它尽可能快地输出 git log 的第一页,你几乎立即看到,即使生成完整的历史记录需要更多的时间;

    • 获取新更改时,Git将检查与服务器共有的对象,并仅发送(压缩)差异在薄包文件的形式。诚然,Subversion可以(或者默认情况下)在更新时只发送差异。



    我不是Git黑客,可能错过了一些技术和技巧,Git用于更好的性能。注意,Git大量使用POSIX(像内存映射文件),所以增益可能不是MS Windows一样大。


    I just saw the first Git tutorial at http://blip.tv/play/Aeu2CAI.

    How does Git store all the versions of all the files, and how can it still be more economical in space than Subversion which saves only the latest version of the code?

    I know this can be done using compression, but that would be at the cost of speed, but this also says that Git is much faster (though where it gains the maximum is the fact that most of its operations are offline).

    So, my guess is that

    • Git compresses data extensively
    • It is still faster because uncompression + work is still faster than network_fetch + work

    Am I correct? Even close?

    解决方案

    I assume you are asking how it is possible for a git clone (full repository + checkout) to be smaller than checked-out sources in Subversion. Or did you mean something else?

    This question is answered in the comments


    Repository size

    First you should take into account that along checkout (working version) Subversion stores pristine copy (last version) in those .svn subdirectories. Pristine copy is stored uncompressed in Subversion.

    Second, git uses the following techniques to make repository smaller:

    • each version of a file is stored only once; this means that if you have only two different versions of some file in 10 revisions (10 commits), git stores only those two versions, not 10.
    • objects (and deltas, see below) are stored compressed; text files used in programming compress really well (around 60% of original size, or 40% reduction in size from compression)
    • after repacking, objects are stored in deltified form, as a difference from some other version; additionally git tries to order delta chains in such a way that the delta consists mainly of deletions (in the usual case of growing files it is in recency order); IIRC deltas are compressed as well.

    Performance (speed of operations)

    First, any operation that involves network would be much slower than a local operation. Therefore for example comparing current state of working area with some other version, or getting a log (a history), which in Subversion involves network connection and network transfer, and in Git is a local operation, would of course be much slower in Subversion than in Git. BTW. this is the difference between centralized version control systems (using client-server workflow) and distributed version control systems (using peer-to-peer workflow), not only between Subversion and Git.

    Second, if I understand it correctly, nowadays the limitation is not CPU but IO (disk access). Therefore it is possible that the gain from having to read less data from disk because of compression (and being able to mmap it in memory) overcomes the loss from having to decompress data.

    Third, Git was designed with performance in mind (see e.g. GitHistory page on Git Wiki):

    • The index stores stat information for files, and Git uses it to decide without examining files if the files were modified or not (see e.g. core.trustctime config variable).
    • The maximum delta depth is limited to pack.depth, which defaults to 50. Git has delta cache to speed up access. There is (generated) packfile index for fast access to objects in packfile.
    • Git takes care to not touch files it doesn't have to. For example when switching branches, or rewinding to another version, Git updates only files that changed. The consequence of this philosophy is that Git does support only very minimal keyword expansion (at least out of the box).
    • Git uses its own version of LibXDiff library, nowadays also for diff and merge, instead of calling external diff / external merge tool.
    • Git tries to minimize latency, which means good perceived performance. For example it outputs first page of "git log" as fast as possible, and you see it almost immediately, even if generating full history would take more time; it doesn't wait for full history to be generated before displaying it.
    • When fetching new changes, Git checks what objects you have in common with the server, and sends only (compressed) differences in the form of thin packfile. Admittedly Subversion can (or perhaps by default it does) also send only differences when updating.

    I am not a Git hacker, and I probably missed some techniques and tricks that Git uses for better performance. Note however that Git heavily uses POSIX (like memory mapped files) for that, so the gain might be not as large on MS Windows.

    这篇关于Git如何节省空间,并同时快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆