最快的方式来检索/存储数百万的小二进制对象 [英] Fastest way to retrieve/store millions of small binary objects

查看:185
本文介绍了最快的方式来检索/存储数百万的小二进制对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个快速(如在巨大的性能,不快速修复)解决方案持久化和检索数千万小(大约1k)二进制对象。每个对象应该具有用于​​检索的唯一ID(优选地,GUID或SHA)。其他要求是它应该可以从.NET使用,它不应该需要额外的软件安装。

I am looking for a fast (as in huge performance, not quick fix) solution for persisting and retrieving tens of millions of small (around 1k) binary objects. Each object should have a unique ID for retrieval (preferably, a GUID or SHA). Additional requirements is that it should be usable from .NET and it shouldn't require additional software installation.

目前,我使用具有单个表的SQLite数据库用于此作业,但是我想摆脱处理简单SQL指令的开销,如SELECT data FROM store WHERE id = id。

Currently, I am using an SQLite database with a single table for this job, but I want to get rid of the overhead of processing simple SQL instructions like SELECT data FROM store WHERE id = id.

我还测试了NTFS下的直接文件系统持久性,但是性能一旦达到50万个对象就会快速下降。

I've also tested direct filesystem persistency under NTFS, but the performance degrades very fast as soon as it reaches half a millions objects.

PS顺便说一句,对象从来不需要被删除,插入率非常,非常低。事实上,每次对象更改时,都会存储新版本,并保留以前的版本。

P.S. By the way, objects never need to be deleted, and the insertion rate is very, very low. In fact, every time an object changes a new version is stored and the previous version remains. This is actually a requirement to support time-traveling.

只需向此主题添加一些额外信息:

Just adding some additional information to this thread:

到BLOB或不到BLOB:数据库或文件系统中的大对象存储 http:// arxiv。 org / abs / cs.DB / 0701168

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem http://arxiv.org/abs/cs.DB/0701168

推荐答案

您可以减少NTFS的性能问题将对象的GUID标识符分割为多个部分,并将它们用作目录名称。这样,每个目录只包含有限数量的子目录或文件。

You may be able to lessen the performance problems of NTFS by breaking the object's GUID identifier up into pieces and using them as directory names. That way, each directory only contains a limited number of subdirectories or files.

例如。如果标识符 aaaa-bb-cc-ddddeeee ,则项目的路径为 c:\store\aaaa\bbcc\\ \\ dddd \eeee.dat ,将每个目录限制为不超过64k个子作品。

e.g. if the identifier is aaaa-bb-cc-ddddeeee, the path to the item would be c:\store\aaaa\bbcc\dddd\eeee.dat, limiting each directory to no more than 64k subitems.

这篇关于最快的方式来检索/存储数百万的小二进制对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆