二元差异 [英] binary diff

查看:95
本文介绍了二元差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我尝试创建一个工具来检查2

二进制文件的delta(diff)并创建delta二进制文件。我使用二进制

格式化程序(序列化)来创建delta二进制文件。它是
工作正常,但delta二进制文件的大小非常大。我有
有1个字节的文件和2个字节的文件,delta应该是1

byte但不知何故它原来是249个字节使用二进制

格式化程序。我猜序列化有一些其他的东西

添加到delta文件。


有没有更好的方法从2创建delta二进制$

给定文件,但delta必须能够构造回原始文件吗?
?我目前的

解决方案,使用序列化二进制格式化程序非常好

且简单但delta文件的大小... *叹气*


请建议,谢谢!

-CL

解决方案

二进制差异是一个棘手的问题。到目前为止,我在

主题上看到的最好的论文是Andrew Tridgell关于rsync的论文。它实际上解决了一个稍微困难的问题,即二元差异,即本地

和远程文件之间的二进制差异,但一般的想法(滚动校验和+强校验和)

可用于以高效的

方式计算两个本地文件之间的二进制差异。


安德鲁的论文是在 http://samba.org/rsync/tech_report/


我不明白二进制序列化如何帮助你实现二进制

diff。似乎只会让事情变得更糟,但也许你想要

来做一些与标准二元差异略有不同的事情。


Bruno。


Ching-Lung < CH ******* @ alumni.cs.utexas.edu> écritdansle message de

news:10 **************************** @ phx.gbl。 ..

大家好,

我尝试创建一个工具来检查2
二进制文件的delta(diff)并创建delta二进制文件。我使用二进制
格式化程序(序列化)来创建delta二进制文件。它工作正常,但delta二进制文件的大小非常大。我有1个字节的文件和2个字节的文件,delta应该是1个字节,但不知何故,使用二进制
格式化器,结果是249个字节。我猜序列化还有一些其他的东西被添加到delta文件中。

有没有更好的方法从2
给定文件创建delta二进制文件,但delta必须能够将
构造回原始文件?我目前的解决方案,使用序列化二进制格式化器很棒
而且容易但delta文件的大小... *叹气*

请指教,谢谢!
-CL



为什么你不能把它写成

二进制文件而不是序列化byte [] diff diff文件(即没有序列化)?


-

William Stacey,DNS MVP


" Ching -Lung" < CH ******* @ alumni.cs.utexas.edu>在消息中写道

news:10 **************************** @ phx.gbl ... < blockquote class =post_quotes>大家好,

我尝试创建一个工具来检查2
二进制文件的delta(diff)并创建delta二进制文件。我使用二进制
格式化程序(序列化)来创建delta二进制文件。它工作正常,但delta二进制文件的大小非常大。我有1个字节的文件和2个字节的文件,delta应该是1个字节,但不知何故,使用二进制
格式化器,结果是249个字节。我猜序列化还有一些其他的东西被添加到delta文件中。

有没有更好的方法从2
给定文件创建delta二进制文件,但delta必须能够将
构造回原始文件?我目前的解决方案,使用序列化二进制格式化器很棒
而且容易但delta文件的大小... *叹气*

请指教,谢谢!
-CL



您好Bruno,


" ...我不知道二进制序列化的方式将帮助你实现二进制差异... $

我比较文件A的byte []和

文件B.如果它们是diff,那么我将字节diff存储在

哈希表(key = offset,value = byte diff)中。还有其他一些需要考虑的案例,即如果那些2

文件的大小不一样等等。一旦完成,我会序列化

哈希表。


使用此delta二进制文件从文件A重建文件B,

我反序列化哈希表,在
之间进行字节比较
文件A和散列表,假设文件A

与文件AI创建二进制差异相同。


"。 ..但也许你想做一些略微不同的事情

比标准的二元差异......


我对标准二元差异不太了解,但我认为

这些都在安德鲁的论文中有所描述。我会尽快查看




谢谢!

-CL

< blockquote class =post_quotes> -----原始消息-----
二进制差异是一个棘手的问题。到目前为止,在主题上看到的最好的论文是Andrew Tridgell关于rsync的论文。实际上
解决了一个稍微困难的问题,即二元差异,即localand远程文件之间的二进制
差异,但一般的想法(滚动校验和+
强校验和)可用于计算二进制在高效率的两个本地
文件之间进行区分。

安德鲁的论文在 http://samba.org/rsync/tech_report/

我不明白二进制序列化如何帮助你实现binarydiff
。好像它只会让事情变得更糟,但b b b b b b b b b b b b b b b b b b b b b b b b b b b b b



肺" < CH ******* @ alumni.cs.utexas.edu> écrit
dans le message denews:10 **************************** @ phx.gbl ...

大家好,

我尝试创建一个工具来检查2
二进制文件的delta(diff)并创建delta二进制文件。我使用二进制
格式化程序(序列化)来创建delta二进制文件。
它运行正常,但delta二进制文件在
大小时非常大。我有1个字节的文件和2个字节的文件,delta应该是1个字节,但不知何故,使用
二进制格式化程序,结果是249个字节。我猜序列化还有一些其他的东西被添加到delta文件中。

有没有更好的方法从2
给定文件创建delta二进制文件,但delta必须能够将
构造回原始文件?我目前的解决方案,使用序列化二进制格式化器很棒
而且容易但delta文件的大小... *叹气*

请指教,谢谢!
-CL





Hi all,

I try to create a tool to check the delta (diff) of 2
binaries and create the delta binary. I use binary
formatter (serialization) to create the delta binary. It
works fine but the delta binary is pretty huge in size. I
have 1 byte file and 2 bytes file, the delta should be 1
byte but somehow it turns out to be 249 bytes using binary
formatter. I guess serialization has some other things
added to the delta file.

Is there any better way to create delta binary from 2
given files, but the delta has to be able to be
constructed back to the original file? My current
solution, using serialization binary formatter is great
and easy but the size of the delta file... *sigh*

Please advice, thanks!
-CL

解决方案

Binary diff is a tough problem. So far, the best paper I have seen on the
subject is Andrew Tridgell''s paper on rsync. It actually tackles a slightly
more difficult problem that just binary diff, i.e. binary diff between local
and remote file, but the general idea (rolling checksum + strong checksum)
can be used to compute a binary diff between two local files in an efficient
way.

Andrew''s paper is on http://samba.org/rsync/tech_report/

I don''t undestand how binary serialization would help you implement binary
diff. Seems like it is only going to make matters worse, but maybe you want
to do something slightly different than standard binary diff.

Bruno.

"Ching-Lung" <ch*******@alumni.cs.utexas.edu> a écrit dans le message de
news:10****************************@phx.gbl...

Hi all,

I try to create a tool to check the delta (diff) of 2
binaries and create the delta binary. I use binary
formatter (serialization) to create the delta binary. It
works fine but the delta binary is pretty huge in size. I
have 1 byte file and 2 bytes file, the delta should be 1
byte but somehow it turns out to be 249 bytes using binary
formatter. I guess serialization has some other things
added to the delta file.

Is there any better way to create delta binary from 2
given files, but the delta has to be able to be
constructed back to the original file? My current
solution, using serialization binary formatter is great
and easy but the size of the delta file... *sigh*

Please advice, thanks!
-CL



Instead of serializing the byte[] diff, why can''t you just write that to a
binary diff file (i.e. no serialization)?

--
William Stacey, DNS MVP

"Ching-Lung" <ch*******@alumni.cs.utexas.edu> wrote in message
news:10****************************@phx.gbl...

Hi all,

I try to create a tool to check the delta (diff) of 2
binaries and create the delta binary. I use binary
formatter (serialization) to create the delta binary. It
works fine but the delta binary is pretty huge in size. I
have 1 byte file and 2 bytes file, the delta should be 1
byte but somehow it turns out to be 249 bytes using binary
formatter. I guess serialization has some other things
added to the delta file.

Is there any better way to create delta binary from 2
given files, but the delta has to be able to be
constructed back to the original file? My current
solution, using serialization binary formatter is great
and easy but the size of the delta file... *sigh*

Please advice, thanks!
-CL



Hi Bruno,

"...I don''t undestand how binary serialization would help
you implement binary diff..."

I compare each byte from byte[] of file A and byte[] of
file B. If they are diff then I store the byte diff in a
hashtable (key = offset, value = byte diff). There are
some other cases to consider, i.e. if the size of those 2
files are not the same, etc. Once done, I serialize the
hashtable.

To reconstruct file B from file A with this delta binary,
I deserialize the hashtable, do byte comparison between
file A and the hashtable with one assumption that file A
is the same as file A I created the binary diff from.

"...but maybe you want to do something slightly different
than standard binary diff..."

I don''t know much about standard binary diff, but I assume
that it''s all described in Andrew''s paper. I''ll check it
out soon.

Thanks!
-CL

-----Original Message-----
Binary diff is a tough problem. So far, the best paper I have seen on thesubject is Andrew Tridgell''s paper on rsync. It actually tackles a slightlymore difficult problem that just binary diff, i.e. binary diff between localand remote file, but the general idea (rolling checksum + strong checksum)can be used to compute a binary diff between two local files in an efficientway.

Andrew''s paper is on http://samba.org/rsync/tech_report/

I don''t undestand how binary serialization would help you implement binarydiff. Seems like it is only going to make matters worse, but maybe you wantto do something slightly different than standard binary diff.
Bruno.

"Ching-Lung" <ch*******@alumni.cs.utexas.edu> a écrit dans le message denews:10****************************@phx.gbl...

Hi all,

I try to create a tool to check the delta (diff) of 2
binaries and create the delta binary. I use binary
formatter (serialization) to create the delta binary. It works fine but the delta binary is pretty huge in size. I have 1 byte file and 2 bytes file, the delta should be 1
byte but somehow it turns out to be 249 bytes using binary formatter. I guess serialization has some other things
added to the delta file.

Is there any better way to create delta binary from 2
given files, but the delta has to be able to be
constructed back to the original file? My current
solution, using serialization binary formatter is great
and easy but the size of the delta file... *sigh*

Please advice, thanks!
-CL


.



这篇关于二元差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆