数据聚合mongodb vs mysql [英] Data aggregation mongodb vs mysql

查看:275
本文介绍了数据聚合mongodb vs mysql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究一个后端,该后端可用于对数据聚合有严格要求的项目.主要项目要求如下.

I am currently researching on a backend to use for a project with demanding data aggregation requirements. The main project requirements are the following.

  1. 为每个用户存储数百万条记录.用户每年可能有超过100万个条目,因此即使有100个用户,我们每年也要讨论1亿个条目.

  1. Store millions of records for each user. Users might have more than 1 million entries per year so even with 100 users we are talking about 100 million entries per year.

必须立即执行这些条目上的数据聚合.用户需要能够通过大量可用过滤器对条目进行过滤,然后呈现汇总(总计,平均值等)和结果图.显然,我无法预先计算任何聚合结果,因为过滤器组合(以及结果集)非常庞大.

Data aggregation on those entries must be performed on the fly. The users need to be able to filter on the entries by a ton of available filters and then present summaries (totals , averages e.t.c) and graphs on the results. Obviously I cannot precalculate any of the aggregation results because the filter combinations (and thus the result sets) are huge.

用户将只能访问他们自己的数据,但是如果可以为所有数据计算匿名统计信息,那就太好了.

Users are going to have access on their own data only but it would be nice if anonymous stats could be calculated for all the data.

大部分时间都是批量处理数据.例如,用户每天都会上传数据,并且可能需要3000条记录.在某些更高版本中,可能会有自动程序,例如每隔几分钟以100批的小批量上传.

The data is going to be most of the time in batch. e.g the user will upload the data every day and it could like 3000 records. In some later version there could be automated programs that upload every few minutes in smaller batches of 100 items for example.

我做了一个简单的测试,创建一个具有100万行的表,并在mongodb和mysql中对1列进行简单的求和,性能差异很大.我不记得确切的数字,但是它有点像mysql = 200ms,mongodb = 20秒.

I made a simple test of creating a table with 1 million rows and performing a simple sum of 1 column both in mongodb and in mysql and the performance difference was huge. I do not remember the exact numbers but it was something like mysql = 200ms , mongodb = 20 sec.

我也使用了beddb进行了测试,结果却差得多.

I have also made the test with couchdb and had much worse results.

在速度方面似乎很有前途的是卡桑德拉,当我第一次发现它时我非常热衷.但是,文档很少,我还没有找到关于如何对数据执行求和和其他聚合函数的可靠示例.有可能吗?

What seems promising speed wise is cassandra which I was very enthusiastic about when I first discovered it. However the documentation is scarce and I haven't found any solid examples on how to perform sums and other aggregate functions on the data. Is that possible ?

从我对当前性能的测试(也许我做错了)看来,尽管自动分片功能似乎非常适合,但无法将mongodb用于此类项目.

As it seems from my test (Maybe I have done something wrong) with the current performance its impossible to use mongodb for such a project although the automated sharding functionality seems like a perfect fit for it.

是否有人在mongodb中具有数据聚合方面的经验,或者是否有任何对项目实施有帮助的见解?

Does anybody have experience with data aggregation in mongodb or have any insights that might be of help for the implementation of the project ?

谢谢, Dimitris

Thanks, Dimitris

推荐答案

在需要使用JavaScript的用例(例如map-reduce-jobs)中,MongoDB的性能从未给我留下深刻的印象.也许在1.51中会更好.我没有尝试.

I've never been impressed by the performance of MongoDB in use cases where javascript is required, for instance map-reduce-jobs. Maybe it is better in 1.51. I didn't try.

您还可以尝试免费的Greenplum单节点版本: http://www. greenplum.com/products/single-node/

You could also try the free single node edition of Greenplum: http://www.greenplum.com/products/single-node/ and http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/

这篇关于数据聚合mongodb vs mysql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆