git值得管理大于500MB的文件 [英] Is git worth for managing many files bigger than 500MB

查看:117
本文介绍了git值得管理大于500MB的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会在版本控制下放入大量的数据,即一个目录结构(深度< = 5)和数百个大小约500Mb的文件。



我需要的是一个帮助我的系统:
- 检测文件是否已更改
- 检测文件是否已添加/删除
- 将整个存储库复制到另一个位置
- 存储检查点并稍后恢复



我不需要sha1用于更改检测,更快的操作是可以接受的。 b
$ b

git值得这么做吗?有更好的选择吗?

正如我在什么是Git限制,Git不是用来管理大文件(或者是大型二进制文件)。



如果您需要:Git将被需要:


  • 知道在一个文件。但对于目录级别而言,其他答案更好(Unison或rsynch)

  • 与您的开发数据和这些大型资源保持紧密接近(即相同的参考)。只有一个引用会有帮助,但是那时你需要一个Git的分支,就像 git-bigfiles 来有效地管理它们。





    注意:仍然使用Git,你可以尝试这种方法


    不幸的是, rsync 对我们的目的来说并不完美。


    • 首先,它不是一个真正的版本控制系统。如果你想存储文件的多个版本,你必须制作多个副本,这是浪费的,或者是 xdelta 它们,这很单调乏味(并且可能很慢重新组装,使得修剪中间版本变得困难),或者将它们检入到git中,由于文件太大,它仍会融化。

    • 加上rsync确实无法正确处理文件重命名。



    好的,另一个想法是什么:让我们将文件拆分成块,然后将每个块分别检查为git

    然后git的增量压缩将不会有太多咀嚼一段时间,我们只需要发送修改的块...


    基于 gzip --rsyncable ,其中 POC可用于此Git仓库


    I would put under version control a big amount of data, i.e. a directory structure (with depth<=5) with hundreds files with size about 500Mb).

    The things I need is a system that help me: - to detect if an files has been changed - to detect if files were added/removed - to clone the entire repository in another location - to store a "checkpoint" and restore it later

    I don't need sha1 for change detect, something faster is acceptable.

    Is git worth for this? There is a better alternative?

    解决方案

    As I mentioned in "What are the Git limits", Git is not made to manage big files (or big binary files for that matter).

    Git would be needed if you needed to:

    • know what has actually changed within a file. But for the directory-level, the other answers are better (Unison or rsynch)
    • keep a close proximity (i.e. "same referential") between your development data, and those large resources. Having only one referential would help, but then you would need a fork of Git, like git-bigfiles to efficiently manage them.

    Note: still using Git, you can try this approach

    Unfortunately, rsync isn't really perfect for our purposes either.

    • First of all, it isn't really a version control system. If you want to store multiple revisions of the file, you have to make multiple copies, which is wasteful, or xdelta them, which is tedious (and potentially slow to reassemble, and makes it hard to prune intermediate versions), or check them into git, which will still melt down because your files are too big.
    • Plus rsync really can't handle file renames properly - at all.

    Okay, what about another idea: let's split the file into chunks, and check each of those blocks into git separately.
    Then git's delta compression won't have too much to chew on at a time, and we only have to send modified blocks...

    Based on gzip --rsyncable, with a POC available in this Git repo.

    这篇关于git值得管理大于500MB的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆