mongodb groupby即使添加索引也很慢 [英] mongodb groupby slow even after adding index
问题描述
我有一个简单的收藏夹:
I have a simple collection :
{
"_id" : ObjectId("5033cc15f31e20b76ca842c8"),
"_class" : "com.pandu.model.alarm.Alarm",
"serverName" : "CDCAWR009 Integration Service",
"serverAddress" : "cdcawr009.na.convergys.com",
"triggered" : ISODate("2012-01-28T05:09:03Z"),
"componentName" : "IntegrationService",
"summary" : "A device which is configured to be recorded is not being recorded.",
"details" : "Extension<153; 40049> on CDCAWR009 is currently not being recorded
properly; recording requested for the following reasons: ",
"priority" : "Major"
}
馆藏中将有大约两百万个这样的文档.我正在尝试按服务器名称分组并获得所有服务器名称的计数.从RDBMS查询的角度看,这听起来很简单.
there will be around couple of millions of such documents in the collection. I am trying to group by the server name and get a count of all server name. Sounds simple from RDBMS query point of view.
The query that I have come up with is
db.alarm.group( {key: { serverName:true }, reduce: function(obj,prev) { prev.count++ }, initial: { count: 0 }});
此外,我在serverName上添加了一个索引.
Also, I have added an index on serverName.
> db.alarm.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.alarm",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"serverName" : 1
},
"ns" : "test.alarm",
"name" : "serverName_1"
}
]
但是,我在13秒钟后在mongodb中得到了响应.而在sql server中,类似的查询也会在4秒内返回,而没有索引.
However, i am getting a response in mongodb after 13 seconds. whereas in sql server, similar query returns back within 4 seconds that too without an index.
有什么我想念的吗?
感谢您的期待.
推荐答案
从您编写的查询中可以看出,2.0中的这种聚合要求您运行Map/Reduce. MongoDB上的Map/Reduce有一些性能上的损失,在之前中已涉及-基本上,除非您能够跨一个并行化集群,您将通过Spidermonkey运行单线程javascript-而不是一个快速的命题.由于您没有选择性,因此索引实际上并没有帮助-您只需要扫描整个索引以及可能的文档即可.
As you can see from the query that you wrote, this type of aggregation in 2.0 requires you to run Map/Reduce. Map/Reduce on MongoDB has some performance penalties which have been covered on SO before - basically unless you are able to parallelize across a cluster you are going to be running single threaded javascript via Spidermonkey - not a speedy proposition. The index, since you are not being selective, does not really help - you just have to scan the whole index as well as potentially the document.
在即将发布的2.2版本中(目前在rc1中是这样),您仍然可以选择. 2.2中引入的聚合框架(本机,不是基于JS的Map/Reduce).具有内置的 group运算符,它是专门为加速此类操作而创建的在MongoDB中运行.
With the imminent release of 2.2 (currently in rc1 as of writing this) you have some options though. The aggregation framework (which is native, not JS based Map/Reduce) introduced in 2.2 has a built in group operator and was created specifically to speed up this kind of operation in MongoDB.
我建议您试一下2.2,看看您的分组表现是否有所提高.我认为它看起来像这样(注意:未测试):
I would recommend giving 2.2 a shot and see if your performance on grouping improves. I think it would look something like this (note: not tested):
db.alarm.aggregate(
{ $group : {
_id : "$serverName",
count : { $sum : 1 }
}}
);
这篇关于mongodb groupby即使添加索引也很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!