git可以使用基于补丁/差异的存储? [英] Can git use patch/diff based storage?

查看:154
本文介绍了git可以使用基于补丁/差异的存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,git存储了每个修订版的完整文件。即使压缩了,也没有办法可以与针对一个原始版本完整文件存储压缩补丁相抗衡。特别是像压缩的二进制文件比如图像等问题。



有没有办法让git使用基于补丁/ diff的后端来存储修订?



我得到为什么git的主要用例是这样做的,但我有一个特殊的用例,我希望如果可以的话使用git,但它也会占用太多很多空间。



谢谢

解决方案

,默默地和自动地,名义上三角洲压缩。它仅适用于打包的文件,每次操作后都不会发生包装。 http://git-scm.com/docs/git-repack =nofollow> git-repack 文档:


包是一个对象的集合,单独压缩,应用增量压缩,存储在单个文件中,并具有关联的索引文件。


  • Git Internals - Packfiles


    磁盘上有两个几乎相同的22K对象。如果Git能够完整地存储其中的一个,但第二个对象仅仅作为与第一个对象之间的增量,这不是很好么?



    事实证明,它能够。 Git在磁盘上保存对象的初始格式称为松散对象格式。然而,偶尔Git会将这些对象中的几个打包到一个名为packfile的二进制文件中,以节省空间并提高效率。如果您手动运行 git gc 命令,或者推送到远程服务器,Git会执行此操作。


    稍后:


    这真的很棒,就是它可以随时都可以重新包装。 Git偶尔会自动重新打包数据库,总是试图节省更多空间,但您也可以随时手工重新打包,方法是手动运行 git gc



  • git gc --ggressive ),它描述了增量压缩是对象存储的副产品而不是修订历史:


    Git不使用标准的每个文件/每提交前向和/或后向德尔塔链以导出文件。相反,使用任何其他存储的版本来派生另一个版本是合法的。将其与大多数版本控制系统进行对比,其中唯一的选择仅仅是计算相对于最新版本的增量。后一种方法非常普遍,可能是因为系统地将增量与修订历史结合起来。在Git中,开发历史并不与这些三角洲(它们被安排为最小化空间使用)相关联,而历史则被强加在较高的抽象层次。


    稍后,引用Linus关于倾向于 git gc --ggressive 来抛弃旧的好的delta,并用更糟糕的deltas替换:


    所以相当于git gc --aggressive - 但是正确地做了 - 是
    做(过夜)类似于

      git repack -a -d --depth = 250 --window = 250 




  • As I understand it, git stores full files of each revision committed. Even though it's compressed there's no way that can compete with, say, storing compressed patches against one original revision full file. It's especially an issue with poorly compressible binary files like images, etc.

    Is there a way to make git use a patch/diff based backend for storing revisions?

    I get why the main use case of git does it the way it does but I have a particular use case where I would like to use git if I could but it would take up too much space.

    Thanks

    解决方案

    Git does use diff based storage, silently and automatically, under the name "delta compression". It applies only to files that are "packed", and packs don't happen after every operation.

    • git-repack docs:

      A pack is a collection of objects, individually compressed, with delta compression applied, stored in a single file, with an associated index file.

    • Git Internals - Packfiles:

      You have two nearly identical 22K objects on your disk. Wouldn’t it be nice if Git could store one of them in full but then the second object only as the delta between it and the first?

      It turns out that it can. The initial format in which Git saves objects on disk is called a "loose" object format. However, occasionally Git packs up several of these objects into a single binary file called a "packfile" in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.

      Later:

      The really nice thing about this is that it can be repacked at any time. Git will occasionally repack your database automatically, always trying to save more space, but you can also manually repack at any time by running git gc by hand.

    • "The woes of git gc --aggressive" (Dan Farina), which describes that delta compression is a byproduct of object storage and not revision history:

      Git does not use your standard per-file/per-commit forward and/or backward delta chains to derive files. Instead, it is legal to use any other stored version to derive another version. Contrast this to most version control systems where the only option is simply to compute the delta against the last version. The latter approach is so common probably because of a systematic tendency to couple the deltas to the revision history. In Git the development history is not in any way tied to these deltas (which are arranged to minimize space usage) and the history is instead imposed at a higher level of abstraction.

      Later, quoting Linus, about the tendency of git gc --aggressive to throw out old good deltas and replace them with worse ones:

      So the equivalent of "git gc --aggressive" - but done properly - is to do (overnight) something like

      git repack -a -d --depth=250 --window=250
      

    这篇关于git可以使用基于补丁/差异的存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆