最快检索/存储数以百万计的小二进制对象的方法 [英] Fastest way to retrieve/store millions of small binary objects

查看:182
本文介绍了最快检索/存储数以百万计的小二进制对象的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要寻找一个快速(如巨大的性能,而不是快速修复)的坚持和检索的数以百万计的小几(约1K),二进制对象的解决方案。每个对象应具有用于获取一个唯一的ID(preferably,一个GUID或SHA)。附加要求是,它应该是从.NET使用,并且它不应该要求安装其他软件。

I am looking for a fast (as in huge performance, not quick fix) solution for persisting and retrieving tens of millions of small (around 1k) binary objects. Each object should have a unique ID for retrieval (preferably, a GUID or SHA). Additional requirements is that it should be usable from .NET and it shouldn't require additional software installation.

目前,我使用与这个工作单个表SQLite数据库,但我想摆脱的处理就像从库中选择数据简单的SQL指令的开销WHERE ID = ID。

Currently, I am using an SQLite database with a single table for this job, but I want to get rid of the overhead of processing simple SQL instructions like SELECT data FROM store WHERE id = id.

我还测试了直接的文件系统的持久性NTFS下,但性能下降得非常快,尽快达到半百万的对象。

I've also tested direct filesystem persistency under NTFS, but the performance degrades very fast as soon as it reaches half a millions objects.

P.S。顺便说一句,对象从来不需要被删除,插入率非常非常低。事实上,每一个对象改变了新版本的时间存储和previous版本仍然存在。这实际上是要求支持时间旅行。

P.S. By the way, objects never need to be deleted, and the insertion rate is very, very low. In fact, every time an object changes a new version is stored and the previous version remains. This is actually a requirement to support time-traveling.

只是增加了一些额外的信息,此主题:

Just adding some additional information to this thread:

要BLOB或不BLOB:在数据库或大对象存储一个文件系统的http:// arxiv.org/abs/cs.DB/0701168

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem http://arxiv.org/abs/cs.DB/0701168

推荐答案

您可以通过打破对象的GUID标识成片,并利用他们作为目录名,以减轻NTFS的性能问题。通过这种方式,每个目录仅包含子目录或文件的数量有限。

You may be able to lessen the performance problems of NTFS by breaking the object's GUID identifier up into pieces and using them as directory names. That way, each directory only contains a limited number of subdirectories or files.

例如。如果标识符为 AAAA-BB-CC-ddddeeee ,该路径项将 C:\店\ AAAA \ BBCC \ DDDD \ EEEE .DAT ,限制每个目录不超过64K的子项。

e.g. if the identifier is aaaa-bb-cc-ddddeeee, the path to the item would be c:\store\aaaa\bbcc\dddd\eeee.dat, limiting each directory to no more than 64k subitems.

这篇关于最快检索/存储数以百万计的小二进制对象的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆