寻找一个超高速数据存储相交操作执行 [英] Looking for an ultrafast data store to perform intersect operations

查看:199
本文介绍了寻找一个超高速数据存储相交操作执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用Redis的一段时间,作为Resque后端,现在我正在寻找相交的大型数据集的操作进行快速的方式,我决定给Redis的一个镜头。

I've been using Redis for a while as a backend for Resque and now that I'm looking for a fast way to perform intersect operation on large sets of data, I decided to give Redis a shot.

我已经进行了如下测试:

I've been conducting the following test:

- X 以Z 是Redis的集合,它们都含有约。一百万的会员(含有3M +成员种子阵列采取随机整数)。

x, y and z are Redis sets, they all contain approx. 1 million members (random integers taken from a seed array containing 3M+ members).

- 我要交的 XY 以Z ,所以我使用的 sintersectstore 的(为了避免过热从服务器造成的数据检索到客户端)

— I want to intersect x y and z, so I'm using sintersectstore (to avoid overheating caused by data retrieval from the server to the client)

sinterstore r x y z

- 结果集(研究)包含五十万会员,Redis的计算这一套大约半秒

— the resulting set (r) contains about half a million members, Redis computes this set in approximately half a second.

半秒不差,但我需要对可能含有每超过十亿成员。

Half a second is not bad, but I would need to perform such calculations on sets that could contain more than a billion members each.

我没有测试过的Redis如何与如此巨大的反应套,但我相信它会花费更多的时间来处理数据。

I haven't tested how Redis would react with such enormous sets but I assume it would take a lot more time to process the data.

我这样做对吗?有没有更快的方式做到这一点?

Am I doing this right? Is there a faster way to do that?

注:

- 本地数组不是因为我在寻找,将几个工人访问的分布式数据存储的选项

— native arrays aren't an option since I'm looking for a distributed data store that would be accessed by several workers.

- 我得到一个8核3.4GHz的@与苹果16GB的内存,磁盘储蓄已在Redis的配置中禁用这些结果

— I get these results on a 8 cores @3.4Ghz Mac with 16GB of RAM, disk saving has been disabled on the Redis configuration.

推荐答案

我怀疑 位图是您的最佳希望。

I suspect that bitmaps are your best hope.

在我的经验中, Redis的是位图一个完美的服务器;你会使用字符串数据结构(可redis的五个数据结构中的一个)

In my experience, redis is a perfect server for bitmaps; you would use the string data structure (one of the five data structures available in redis)

许多或者所有的操作,您将需要执行可用外的即装即用的redis的,因为原子操作

many or perhaps all of the operations you will need to perform are available out-of-the-box in redis, as atomic operations

Redis的 setbit 操作有时间复杂度O(1)

the redis setbit operation has time complexity of O(1)

在一个典型的实现中,将散列阵列的值,以抵消上的位串的值,则在其相应的偏移(或索引)设置的每个位的;像这样:

In a typical implementation, you would hash your array values to offset values on the bit string, then set each bit at its corresponding offset (or index); like so:

>>> r1.setbit('k1', 20, 1)

第一个参数是关键,第二个是偏移量(指数值),三是对位图索引处的值。

the first argument is the key, the second is the offset (index value) and the third is the value at that index on the bitmap.

发现如果一个位在这个偏移量(20),调用的 GETBIT 的传入的比特串的关键。

to find if a bit is set at this offset (20), call getbit passing in the key for the bit string.

>>> r1.getbit('k1', 20)

然后在这些位图,当然可以例如执行通常按位操作,逻辑与,或,异或

then on those bitmaps, you can of course perform the usual bitwise operations e.g., logical AND, OR, XOR.

这篇关于寻找一个超高速数据存储相交操作执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆