是git二进制diff算法(delta存储)标准化吗? [英] Is the git binary diff algorithm (delta storage) standardized?

查看:581
本文介绍了是git二进制diff算法(delta存储)标准化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Git使用增量压缩存储类似于彼此的对象。



此算法是否标准化,并在其他工具中使用?是否有描述格式的文档?它与xdelta / VCDIFF / RFC 3284兼容吗?

解决方案

我认为diff algo用于打包文件已链接到增量编码(2005)xdelta a>,然后 libXDiff

但是,如下所述,它转向了一个自定义实现。



无论如何,如在此处提及


Git仅在packfiles中修改。

但是当你通过SSH push时,git会生成一个包含提交的包文件,另一方没有
,这些包是瘦包,所以他们也有deltas ...但是远程端然后添加基础


(注意:创建许多packfiles或检索巨大packfile中的信息代价高昂,并解释为什么git不能处理很好的大文件或大型仓库。

查看 git with大文件



此主题还提醒我们:


实际上,打包文件和增量( LibXDiff,而不是xdelta 我原先因为网络带宽(比磁盘空间成本高得多)和 I / O性能使用单个mmapped文件而不是非常大的松散对象的数量。


2008 thread



但是,从那时起,算法已经演变,可能在自定义的一个,因为这 2011年线程说明,并标记为 diff-delta.c 指出:


Git中的当前代码与libxdiff代码完全不相似。

但是这两种实现的基本算法是相同的


学习libxdiff版本可能更容易理解这是如何工作的。




  / * 
* diff-delta.c:缓冲区
*
*此代码极大地受到来自Davide Libenzi的LibXDiff部分的启发
* http://www.xmailserver.org/xdiff-lib.html
*
*由Nicolas Pitre重写为GIT由Nicolas Pitre< nico@fluxnic.net>,(C)2005-2007
* /

有关 packfiles的Git图书的更多信息




Git uses a delta compression to store objects that are similar to each-other.

Is this algorithm standardized and used in other tools as well? Is there documentation describing the format? Is it compatible with xdelta/VCDIFF/RFC 3284?

解决方案

I think the diff algo used for pack files was linked to one of the delta encoding out there: initially (2005) xdelta, and then libXDiff.
But then, as detailed below, it shifted to a custom implementation.

Anyway, as mentioned here:

Git does deltification only in packfiles.
But when you push via SSH git would generate a pack file with commits the other side doesn't have, and those packs are thin packs, so they also have deltas... but the remote side then adds bases to those thin packs making them standalone.

(note: creating many packfiles, or retrieving information in huge packfile is costly, and explain why git doesn't handle well huge files or huge repo.
See more at "git with large files")

This thread also reminds us:

Actually packfiles and deltification (LibXDiff, not xdelta) was, from what I remember and understand, originally because of network bandwidth (which is much more costly than disk space), and I/O performance of using single mmapped file instead of very large number of loose objects.

LibXDiff is mentioned in this 2008 thread.

However, since then, the algo has evolved, probably in a custom one, as this 2011 thread illustrates, and as the header of diff-delta.c points out:

So, strictly speaking, the current code in Git doesn't bear any resemblance with the libxdiff code at all.
However the basic algorithm behind both implementations is the same
.
Studying the libxdiff version is probably easier in order to gain an understanding of how this works.

/*
 * diff-delta.c: generate a delta between two buffers
 *
 * This code was greatly inspired by parts of LibXDiff from Davide Libenzi
 * http://www.xmailserver.org/xdiff-lib.html
 *
 * Rewritten for GIT by Nicolas Pitre <nico@fluxnic.net>, (C) 2005-2007
 */

More on the packfiles the Git Book:

这篇关于是git二进制diff算法(delta存储)标准化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆