我可以对数千兆字节的数据集使用Subversion吗? [英] Can I use Subversion for a multi gigabyte data set?

查看:72
本文介绍了我可以对数千兆字节的数据集使用Subversion吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据集是6766个文件夹中的97984个文件,具有2,57 GB.其中很多是二进制文件.

The data set is 97984 files in 6766 folders with 2,57 GB. A lot of them are binary files.

对我来说,这听起来并不多.每天的数据更改率可能是50个文件上的数百KB.但是我担心颠覆会变得非常缓慢.

For me this does not sound so much. The daily data change rate is in the hundreds of KB on maybe 50 files. But I'm scared that subversion will become extremely slow.

无论如何它从来都不是很快,上一次在v1.2上,建议将其拆分为多个存储库.不,我不喜欢这样.

It was never fast anyway and the last time at v1.2 the recommendation was splitting it into multiple repositories. No, I don't like this.

我可以通过某种方式告诉Subversion或任何其他免费的开源版本控件信任文件修改时间/文件大小以检测文件更改而不比较所有文件吗? 这样,将数据放到快速的现代SSD上,它应该可以快速运行,例如,完成提交的时间少于6秒(比从Windows资源管理器属性对话框中获取摘要要快3倍).

Is there way that I can tell Subversion or any other free open source version control to trust the file modified time/file size to detect file changes and not compare all the files? With this and putting the data on a fast modern SSD it should run fast, say, less then 6 seconds for a complete commit (that's 3x more then getting the summary from the Windows Explorer properties dialog).

推荐答案

我刚刚在计算机上进行了基准测试,以了解这是什么情况:

I've just done a benchmark on my machine to see what this is like:

Data size - 2.3Gb (84000 files in 6000 directories, random textual data)
Checkout time 14m
Changed 500 files (14M of data changes)
Commit time 50seconds

要了解手动比较所有这些文件要花多长时间,我还对2次导出该数据(版本1与版本2)进行了比较.

To get an idea of how long it would take to manually compare all those files, I also ran a diff against 2 exports of that data (version1 against version2).

Diff time: 55m

我不确定ssd是否会使提交时间减少到您希望的时间,但是我使用的是普通的单个sata磁盘来进行50秒和55分钟的比较.

I'm not sure if an ssd would get that commit time down as much as you hope, but I was using a normal single sata disk to get both the 50 seconds and 55minutes comparisons.

对我来说,这些时间强烈建议默认情况下svn不检查文件的内容.

To me, these times strongly suggest that the contents of the files are not being checked by svn by default.

这是svn 1.6.

这篇关于我可以对数千兆字节的数据集使用Subversion吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆