要选择哪个数据库(Cassandra,MongoDB,?)用于存储和查询事件/日志/度量数据? [英] Which database to choose (Cassandra, MongoDB, ?) for storing and querying event / log / metrics data?

查看:108
本文介绍了要选择哪个数据库(Cassandra,MongoDB,?)用于存储和查询事件/日志/度量数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在sql术语中,我们存储的数据如下:

In sql terms we're storing data like this:

table events (
  id
  timestamp
  dimension1
  dimension2
  dimension3
  etc.
)


b $ b

所有维度值都是整数。

All dimension values are integers. This table is becoming very large.

我们希望查询的愚蠢的快速读取:

We want stupidly fast reads for queries like this:

SELECT dimension1, dimension2, COUNT(*) 
FROM   events
WHERE  dimension8 = 'foo'
AND    dimension9 = 'bar'
GROUP BY 1, 2

我们希望快速写入,而不关心事务和一致性。我们关心最终的可用性和分区容限。

We want fast writes, and don't care about transactions and consistency. We care about eventual availability and partition tolerance.

我在寻找NoSQL替代品。 Can Casandra做我想要的查询?这不是立即显而易见从阅读他们的文档...如果它可以做那些类型的查询的性能是什么?

I was looking at "NoSQL" alternatives. Can Casandra do the kind of queries I'm looking for?? This isn't immediately obvious from reading their docs... if it can do that, what is it's performance for those types of queries?

还要查看MongoDB,但是他们的group()函数有严重的限制,只要我可以读取(最多10,000行)。

Was also looking at MongoDB, but their "group()" function has severe limitations as far as I could read (max of 10,000 rows).

你有任何这些数据库的经验,你会推荐它作为上述问题的解决方案吗?

Do you have experience with any of these databases, and would you recommend it as a solution to the problem described above?

有没有其他数据库可以快速完成这些查询?

Are there any other databases I should consider that can do these kind of queries fast?

b $ b jimmy

Cheers, jimmy

推荐答案

分组和愚蠢的快不要在一起。这就是那只野兽的本质...因此,对Mongo集团运作的限制; Cassandra甚至不支持它本机(虽然它用于Hive或Pig查询通过Hadoop ...但是那些不是愚蠢的快速)。

"Group by" and "stupidly fast" do not go together. That's just the nature of that beast... Hence the limitations on Mongo's group operation; Cassandra doesn't even support it natively (although it does for Hive or Pig queries via Hadoop... but those are not intended to be stupidly fast).

系统喜欢Twitter的Rainbird(使用Cassandra)做实时分析,通过反规范化/预计算计数: http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011

Systems like Twitter's Rainbird (which uses Cassandra) doing realtime analytics do it by denormalizing/pre-computing the counts: http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011

这篇关于要选择哪个数据库(Cassandra,MongoDB,?)用于存储和查询事件/日志/度量数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆