哪个集群NoSQL DB用于消息存储目的? [英] Which clustered NoSQL DB for a Message Storing purpose?

查看:256
本文介绍了哪个集群NoSQL DB用于消息存储目的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

另一个问题是选择哪个NoSQL。
然而,我还没有发现有人要求这种类型的目的,消息存储...



我有一个Erlang聊天服务器,已经使用MySQL存储好友列表,以及JOIN需要信息。



我想存储消息(用户没有收到, )并检索它们。



我已经预先选择了NoSQL,我不能使用像MongoDB这样的东西,因为它是面向RAM的范式,其他。
我有三个选择我的列表我猜:




  • Hbase

  • Riak

  • Cassandra



我知道他们的模型退出不同,一个使用键/值,其他使用SuperColumns和co。



到目前为止,我有一个偏好Riak,因为它是稳定的客户端库为Erlang。



我知道我可以使用Cassandra与Thrift,但它似乎不是很稳定与Erlang(我没有得到很好的回报)



现在真的知道HBase的任何东西,只是知道它存在,基于Dynamo像Cassandra和Riak。



所以这里是我需要做的:




  • 每个注册用户存储1到X个消息。

  • 获取每个用户存储的消息数。 / li>
  • 立即从用户处检索所有邮件。

  • 立即从用户处删除所有邮件。

  • 删除所有超过X个月的邮件



现在,我真的是新的那些NoSQL DB,我一直是MySQL aficionados,这就是为什么我问你这个问题,作为一个新手,将有更多的经验,我可以帮助我选择哪一个更好的人,让我做我想做的一切,没有多少麻烦...



谢谢!

解决方案

我不能说Cassandra或Hbase



是的,Riak适合您的场景(我已经看到几家公司和社交网络将其用于类似目的)。



要实现这一点,你需要简单的Riak Key / Value操作,加上一些索引引擎。您的选项是(按照优先顺序):


  1. CRDT集。如果您的1-N集合大小是合理大小(比方说,每个用户或其他任何内容少于50条消息),您可以将子集合的键存储在 / b>


  2. Riak搜索。如果您的集合大小很大,特别是如果您需要在任意字段上搜索对象,则可以使用 Riak搜索。它在后台旋转Apache Solr,并根据您定义的模式索引对象。它具有非常棒的搜索,聚合和统计信息,地理空间功能等。


  3. 二级索引。您可以在之上运行Riak eLevelDB存储后端,并启用 Secondary Index (2i)功能。


运行几个性能测试,



就模式而言,我建议使用两个桶(对于你描述的设置):一个用户桶和一个消息桶。



索引消息桶。 (通过将搜索索引与其关联,或通过2i存储user_key)。这可让您执行所有必要的操作(且邮件日志不必适合内存):




  • 每个注册用户1到X个消息 - 创建User对象并获取用户密钥后,每个用户存储任意数量的消息很容易,它们会直接写入消息桶,每个消息存储

  • 获取每个用户存储的邮件数 - 没有问题。获取属于用户的消息键列表(通过搜索查询,检索保持键的Set对象或通过user_key上的2i查询)。

  • 立即从用户检索所有邮件 - 查看上一项。获取属于用户的所有消息的键列表(通过搜索,集合或2i),然后通过多次获取每个键的值来获取这些键的实际消息(所有官方Riak客户端都有 multiFetch 能力,客户端)。

  • 一次删除用户的所有邮件 - 非常相似。获取用户的消息键列表,在客户端删除它们。

  • 删除所有超过X个月的邮件 - 您可以在Date上添加索引。然后,检索超过X个月的所有邮件键(通过搜索或2i),并为其发出客户端删除。


Yet another question about which NoSQL to choose. However, I haven't found yet someone asking for this type of purpose, message storing...

I have an Erlang Chat Server made, I'm already using MySQL for storing friend list, and "JOIN needed" informations.

I would like to store Messages (That user has not receive because he was offline...) and retrieve them.

I have made a pre-selection of NoSQL, I can't use things like MongoDB due to it's RAM oriented paradigm, and fail to cluster like others. I have down my list to 3 choices I guess :

  • Hbase
  • Riak
  • Cassandra

I know that their model are quit different, one using key/value, the other using SuperColumns and co.

Until now I had a preference for Riak due to it's stable client library for Erlang.

I know that I can use Cassandra with Thrift, but it seems not very stable with Erlang (I haven't got good returns about it)

I don't really know anything about HBase right now, just know it exist and is based on Dynamo like Cassandra and Riak.

So Here's what I need to do :

  • Store from 1 to X messages per registered user.
  • Get the number of stored messages per user.
  • retrieve all messages from an user at once.
  • delete all messages from an user at once.
  • delete all messages that are older than X months

Right now, I'm really new to those NoSQL DB, I always been a MySQL aficionados, This is why I ask you this question, as a Newbie, would someone who has more experience than I could Help me to choose which one is better, and would let me do everything I want to without to much hassle...

Thanks !

解决方案

I can't speak for Cassandra or Hbase, but let me address the Riak part.

Yes, Riak would be appropriate for your scenario (and I've seen several companies and social networks use it for a similar purpose).

To implement this, you would need the plain Riak Key/Value operations, plus some sort of indexing engine. Your options are (in rough order of preference):

  1. CRDT Sets. If your 1-N collection size is reasonably sized (let's say, there's less than 50 messages per user or whatever), you can store the keys of the child collection in a CRDT Set Data Type.

  2. Riak Search. If your collection size is large, and especially if you need to search your objects on arbitrary fields, you can use Riak Search. It spins up Apache Solr in the background, and indexes your objects according to a schema you define. It has pretty awesome searching, aggregation and statistics, geospatial capabilities, etc.

  3. Secondary Indexes. You can run Riak on top of an eLevelDB storage back end, and enable Secondary Index (2i) functionality.

Run a few performance tests, to pick the fastest approach.

As far as schema, I would recommend using two buckets (for the setup you describe): a User bucket, and a Message bucket.

Index the message bucket. (Either by associating a Search index with it, or by storing a user_key via 2i). This lets you do all of the required operations (and the message log does not have to fit into memory):

  • Store from 1 to X messages per registered user - Once you create a User object and get a user key, storing an arbitrary amount of messages per user is easy, they would be straight up writes to the Message bucket, each message storing the appropriate user_key as a secondary index.
  • Get the number of stored messages per user - No problem. Get the list of message keys belonging to a user (via a search query, by retrieving the Set object where you're keeping the keys, or via a 2i query on user_key). This lets you get the count on the client side.
  • retrieve all messages from a user at once - See previous item. Get the list of keys of all messages belonging to the user (via Search, Sets or 2i), and then fetch the actual messages for those keys by multi-fetching the values for each key (all the official Riak clients have a multiFetch capability, client-side).
  • delete all messages from a user at once - Very similar. Get list of message keys for the user, issue Deletes to them on the client side.
  • delete all messages that are older than X months - You can add an index on Date. Then, retrieve all message keys older than X months (via Search or 2i), and issue client-side Deletes for them.

这篇关于哪个集群NoSQL DB用于消息存储目的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆