二进制增量存储 [英] Binary Delta Storage

查看:83
本文介绍了二进制增量存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种针对大型二进制文件(数字音频工作站文件)版本的二进制增量存储解决方案

I'm looking for a binary delta storage solution to version large binary files (digital audio workstation files)

使用DAW文件时,与用于存储原始数据(波形)的大量数据相比,大多数更改(尤其是在混合结束时)很小.

When working with DAW files, the majority of changes, especially near the end of the mix are very small in comparison to the huge amount of data used to store raw data (waves).

为我们的DAW文件提供一个版本控制系统非常好,这样我们就可以回滚到较早的版本.

It would be great to have a versioning system for our DAW files, allowing us to roll back to older versions.

系统只会保存每个版本的二进制文件(diff)之间的差异.这将为我们提供从当前版本更改为先前版本的说明列表,而无需为每个版本存储完整文件.

The system would only save the difference between the binary files (diff) of each version. This would give us a list of instructions to change from the current version to the previous version without storing the full file for every single version.

目前有任何版本控制系统可以执行此操作吗?我已经读过SVN使用二进制diff来节省存储库中的空间...但是我也读过它实际上并不只对二进制文件文本文件执行此操作...不确定.有什么想法吗?

Is there any current versioning systems that do this? I've read that SVN using binary diff's to save space in the repo... But I've also read that it doesn't actually do that for binary files only text files... Not sure. Any ideas?

到目前为止,我的行动计划是继续研究预先存在的工具,如果没有的话,请熟悉使用c/c ++读取二进制数据并自己创建工具.

My plan of action as of right now is to continue research into preexisiting tools, and if none exist, become comfortable with c/c++ reading binary data and creating the tool myself.

推荐答案

我无法评论在通过网络提交大文件时可能存在的可靠性或连接问题(一篇引用的帖子暗示了问题).但是,这里有一些经验数据可能对您有用(或无用).

I can't comment on the reliability or connection issues that might exist when committing a large file across the network (one referenced post hinted at problems). But here is a little bit of empirical data that you may find useful (or not).

我今天一直在做一些测试,以研究磁盘的查找时间,因此手头有一个相当不错的测试用例.我发现您的问题很有趣,因此我对正在使用/修改的文件进行了快速测试.我创建了一个本地Subversion存储库,并向其中添加了两个二进制文件(大小如下所示),然后在对它们进行更改之后将这些文件提交了两次.较小的二进制文件(.85 GB)每次都只是在其末尾添加了数据.较大的文件(2.2GB)包含表示b树的数据,这些树由随机"整数数据组成.两次提交之间对该文件的更新涉及添加大约4000个新的随机值,因此修改后的节点可能会在整个文件中均匀分布.

I have been doing some tests today studying disk seek times and so had a reasonably good test case readily at hand. I found your question interesting, so I did a quick test with the files I am using/modifying. I created a local Subversion repository and added two binary files to it (sizes shown below) and then committed the files a couple of times after changes were made to them. The smaller binary file (.85 GB) simply had data added to the end of it each time. The larger file (2.2GB) contains data representing b-trees consisting of "random" integer data. The updates to that file between commits involved adding approximately 4000 new random values, so it would have modified nodes spread somewhat evenly throughout the file.

以下是提交后的原始文件大小以及本地Subversion存储库中所有文件的大小/计数:

Here are the original file sizes along with the size/count of all files in the local subversion repository after the commit:

file1    851,271,675  
file2  2,205,798,400 

1,892,512,437 bytes in 32 files and 32 dirs

第二次提交后:

file1    851,287,155  
file2  2,207,569,920  

1,894,211,472 bytes in 34 files and 32 dirs

第三次提交后:

file1    851,308,845  
file2  2,210,174,976  

1,897,510,389 bytes in 36 files and 32 dirs

提交有些冗长.我没有密切关注,因为我正在做其他工作,但我认为每个人可能要花10分钟.要检查特定的修订版本大约需要5分钟.我不会根据自己的结果提出建议.我只能说它似乎运行良好,并且没有发生任何错误.而且文件差异似乎很好(对于这些文件).

The commits were somewhat lengthy. I didn't pay close attention because I was doing other work, but I think each one took maybe 10 minutes. To check out a specific revision took about 5 minutes. I would not make a recommendation one way or other based on my results. All I can say is that it seemed to work fine and no errors occurred. And the file differencing seemed to work well (for these files).

这篇关于二进制增量存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆