mongodb groupby即使添加索引也很慢 [英] mongodb groupby slow even after adding index

查看:773
本文介绍了mongodb groupby即使添加索引也很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的收藏夹:

I have a simple collection :

{
    "_id" : ObjectId("5033cc15f31e20b76ca842c8"),
    "_class" : "com.pandu.model.alarm.Alarm",
    "serverName" : "CDCAWR009 Integration Service",
    "serverAddress" : "cdcawr009.na.convergys.com",
    "triggered" : ISODate("2012-01-28T05:09:03Z"),
    "componentName" : "IntegrationService",
    "summary" : "A device which is configured to be recorded is not being recorded.",
    "details" : "Extension<153; 40049> on CDCAWR009 is currently not being recorded
    properly; recording requested for the following reasons: ",
    "priority" : "Major"
}

馆藏中将有大约两百万个这样的文档.我正在尝试按服务器名称分组并获得所有服务器名称的计数.从RDBMS查询的角度看,这听起来很简单.

there will be around couple of millions of such documents in the collection. I am trying to group by the server name and get a count of all server name. Sounds simple from RDBMS query point of view.

The query that I have come up with is 
    db.alarm.group( {key: { serverName:true }, reduce: function(obj,prev) { prev.count++ }, initial: { count: 0 }});

此外,我在serverName上添加了一个索引.

Also, I have added an index on serverName.

> db.alarm.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.alarm",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "serverName" : 1
                },
                "ns" : "test.alarm",
                "name" : "serverName_1"
        }
]

但是,我在13秒钟后在mongodb中得到了响应.而在sql server中,类似的查询也会在4秒内返回,而没有索引.

However, i am getting a response in mongodb after 13 seconds. whereas in sql server, similar query returns back within 4 seconds that too without an index.

有什么我想念的吗?

感谢您的期待.

推荐答案

从您编写的查询中可以看出,2.0中的这种聚合要求您运行Map/Reduce. MongoDB上的Map/Reduce有一些性能上的损失,在之前中已涉及-基本上,除非您能够跨一个并行化集群,您将通过Spidermonkey运行单线程javascript-而不是一个快速的命题.由于您没有选择性,因此索引实际上并没有帮助-您只需要扫描整个索引以及可能的文档即可.

As you can see from the query that you wrote, this type of aggregation in 2.0 requires you to run Map/Reduce. Map/Reduce on MongoDB has some performance penalties which have been covered on SO before - basically unless you are able to parallelize across a cluster you are going to be running single threaded javascript via Spidermonkey - not a speedy proposition. The index, since you are not being selective, does not really help - you just have to scan the whole index as well as potentially the document.

在即将发布的2.2版本中(目前在rc1中是这样),您仍然可以选择. 2.2中引入的聚合框架(本机,不是基于JS的Map/Reduce).具有内置的 group运算符,它是专门为加速此类操作而创建的在MongoDB中运行.

With the imminent release of 2.2 (currently in rc1 as of writing this) you have some options though. The aggregation framework (which is native, not JS based Map/Reduce) introduced in 2.2 has a built in group operator and was created specifically to speed up this kind of operation in MongoDB.

我建议您试一下2.2,看看您的分组表现是否有所提高.我认为它看起来像这样(注意:未测试):

I would recommend giving 2.2 a shot and see if your performance on grouping improves. I think it would look something like this (note: not tested):

db.alarm.aggregate(
    { $group : {
        _id : "$serverName",
        count : { $sum : 1 }
    }}
);

这篇关于mongodb groupby即使添加索引也很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆