将数亿个小图像存储到键/值存储或其他nosql数据库是个好主意吗? [英] Is it a good idea to store hundreds of millions small images to a key/value store or other nosql database?

查看:54
本文介绍了将数亿个小图像存储到键/值存储或其他nosql数据库是个好主意吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个Web系统,以处理ReiserFS上的大量小型图像,大约1亿个50kb〜200kb的图像.

I am developing a web system to handle a very large set of small images, about 100 millions images of 50kb ~ 200kb, working on ReiserFS.

目前,difficult to backup and sync只是大量的小文件.

For now, it is very difficult to backup and sync those large number of small files.

我的问题是,将这些小图像存储到键/值存储或其他Nosql数据库(例如GridFS (Mongodb)Tokyo TyrantVoldemort)以获得更好的性能并带来更好的备份支持是一个好主意吗?

My question is that if it a good idea to store these small images to a key/value store or other nosql database such as GridFS (Mongodb), Tokyo Tyrant, Voldemort to gain more performance and bring better backup support?

推荐答案

首先,看看以下内容:

First off, have a look at this: Storing a millon images in the filesystem. While it isn't about backups, it is a worthwile discussion of the topic at hand.

是的,大量小文件令人讨厌;它们占用inode,需要用于文件名& c的空间. (并且花时间来备份所有这些元数据).基本上,听起来好像您已经弄清楚了文件的提供;如果在nginx上运行它,并且前面带有varnish,则几乎无法使其更快.在该数据库下添加数据库只会使事情变得更加复杂.在备份方面也是如此. las,我建议您更加努力地执行就地FS备份策略.

And yes, large numbers of small files are pesky; They take up inodes, require space for filenames &c. (And it takes time to do backup of all this meta-data). Basically it sounds like you got the serving of the files figured out; if you run it on nginx, with a varnish in front or such, you can hardly make it any faster. Adding a database under that will only make things more complicated; also when it comes to backing up. Alas, I would suggest working harder on a in-place FS backup strategy.

首先,您是否尝试过使用-az开关(分别为存档和压缩)进行rsync?它们往往非常高效,因为它不会一次又一次地传输相同的文件.

First off, have you tried rsync with the -az-switches (archive and compression, respectively)? They tend to be highly effective, as it doesn't transfer the same files again and again.

或者,我的建议是将tar + gz转换为许多文件.用伪代码(并假设您将它们放在不同的子文件夹中):

Alternately, my suggestion would be to tar + gz into a number of files. In pseudo-code (and assuming you got them in different sub-folders):

foreach prefix (`ls -1`):
    tar -c $prefix | gzip -c -9 | ssh -z destination.example.tld "cat > backup_`date --iso`_$prefix.tar.gz"
end

这将创建许多.tar.gz文件,这些文件可以轻松传输而不会产生过多开销.

This will create a number of .tar.gz-files that are easily transferred without too much overhead.

这篇关于将数亿个小图像存储到键/值存储或其他nosql数据库是个好主意吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆