MongoDB:慢查询,即使有索引 [英] MongoDB: Slow query, even with index

查看:93
本文介绍了MongoDB:慢查询,即使有索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网页,该网页使用MongoDB来存储和检索各种度量.突然,在某些时候,我的网页变得如此呆滞,无法使用.事实证明,我的数据库是罪魁祸首.

I have a webpage, which uses MongoDB for storing and retrieving various measurements. Suddenly, in some point, my webpage became so sluggish it became unusable. It turns out, my database is the culprit.

我一直在寻找并没有找到解决问题的方法,对此我深表歉意,因为我是MongoDB的新手,目前正在拔头发.

I searched for and have not found any solution for my problem, and I apologize, as I am pretty new to MongoDB and pulling my hair out at the moment.

我正在使用的MongoDB版本是2.4.6,在具有20GB RAM的VM机器上,该机器运行Ubuntu服务器12.04.没有设置副本或分片.

Version of MongoDB I am using is 2.4.6, on VM Machine with 20GB RAM, which runs Ubuntu server 12.04. There is no replica or sharding set up.

首先,我将分析级别设置为2,它显示了最慢的查询:

Firstly, I set my profiling level to 2 and it revealed the slowest query:

db.system.profile.find().sort({"millis":-1}).limit(1).pretty()
{
        "op" : "query",
        "ns" : "station.measurement",
        "query" : {
                "$query" : {
                        "e" : {
                                "$gte" : 0
                        },
                        "id" : "180"
                },
                "$orderby" : {
                        "t" : -1
                }
        },
        "ntoreturn" : 1,
        "ntoskip" : 0,
        "nscanned" : 3295221,
        "keyUpdates" : 0,
        "numYield" : 6,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(12184722),
                        "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(5636351),
                        "w" : NumberLong(5)
                }
        },
        "nreturned" : 0,
        "responseLength" : 20,
        "millis" : 6549,
        "ts" : ISODate("2015-03-16T08:57:07.772Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}

我使用.explain()运行了该特定查询,看起来,它按原样使用索引,但是花费的时间太长.我还在另一个功能非常弱的服务器上运行了相同的查询,并在一秒钟之内就发出了结果.

I ran that specific query with .explain() and looks like, it uses index as it should, but it takes too long. I also ran that same query on my another, drastically weaker server and sput out the results like a champ in a second.

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 660385,
        "nscannedObjectsAllPlans" : 1981098,
        "nscannedAllPlans" : 3301849,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 7,
        "nChunkSkips" : 0,
        "millis" : 7243,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

接下来,我查看了 measurement 集合的索引,对我来说很好:

Next, I looked into indexes of measurement collection and it looked fine to me:

> db.measurement.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "station.measurement",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "t" : 1
                },
                "ns" : "station.measurement",
                "name" : "t_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "d" : 1,
                        "_id" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_d_1__id_-1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : 1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_-1"
        }
]

以下是我收藏中的其余信息:

Here is also the rest of information of my collection:

> db.measurement.stats()
{
        "ns" : "station.measurement",
        "count" : 157835456,
        "size" : 22377799512,
        "avgObjSize" : 141.77929395027692,
        "storageSize" : 26476834672,
        "numExtents" : 33,
        "nindexes" : 5,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1.0000000000028617,
        "systemFlags" : 0,
        "userFlags" : 0,
        "totalIndexSize" : 30996614096,
        "indexSizes" : {
                "_id_" : 6104250656,
                "t_1" : 3971369360,
                "id_1_d_1__id_-1" : 8397896640,
                "id_1_t_-1_e_1" : 6261548720,
                "id_1_t_-1_e_-1" : 6261548720
        },
        "ok" : 1
}

我尝试添加新索引,修复整个数据库,重新索引.我究竟做错了什么?我非常想方设法用尽所有帮助,对此我深表感谢.

I tried adding new index, repairing whole database, reindex. What am I doing wrong? I really appreciate any help as I desperately ran out of ideas.

更新1:

我按照Neil Lunn的建议添加了两个索引,其中一些查询的运行速度更快:

I added two indexes as suggested by Neil Lunn, some of the queries are a LOT faster:

{
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : 1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_1_t_-1",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : -1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_-1_t_-1",
                "background" : true
        }

我得到的结果很有趣(不确定它们是否相关)

Results I've got are interesting (not sure though they are relevant)

接下来的两个查询仅以"id"不同.请注意,每个查询使用不同的索引,为什么?我应该删除较旧的吗?

Next two queries differs by "id" only. Please notice, each query uses different index, why? Should I delete older ones?

> db.measurement.find({"id":"119", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 840747,
        "nscannedObjects" : 840747,
        "nscanned" : 1047044,
        "nscannedObjectsAllPlans" : 1056722,
        "nscannedAllPlans" : 1311344,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 4,
        "nChunkSkips" : 0,
        "millis" : 3730,
        "indexBounds" : {
                "id" : [
                        [
                                "119",
                                "119"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_e_1_t_-1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 0,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 45,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ]
        },
        "server" : "station:27017"
}

问题可能出在其他地方吗?是什么原因导致突然的呆滞"?我还有其他几个集合,查询突然也变慢了.

Could the problem be somewhere else? What could cause that sudden "sluggishness"? I have several other collections, where queries are suddenly slower also.

哦,还有一件事.在我拥有的另一台服务器上,索引与添加新索引之前的索引相同.是的,收集要小一些,但速度要快几倍.

Oh, and another thing. On that other server I have, indexes are the same as here before I added new ones. Yes, collection is a bit smaller but it is several times faster.

推荐答案

这时在索引和查询顺序选择中都是如此.

Then point here was in both the index and query ordering selections.

如果查看来自.explain()的早期输出,您将看到表达式中"t"元素上存在最小/最大"范围.通过将求值移至末尾",可以允许其他对整体表达式更重要的过滤元素(在通过"t"扫描基本所有内容"之前,确定"e"的较少匹配项是主要因素)

If you look at your earlier output from .explain() you will see that that there is a "min/max" range on the "t" element in your expression. By "moving that to the end" of the evaluation, you allow other filtering elements that are more important to the overall expression ( determine less possible matches of "e" to be the main factor before scanning though "t" in basically "everything".

这有点DBA,但是在NoSQL世界中,我确实相信这已成为程序员的问题.

It's a little bit DBA, but in the NoSQL world I do believe this becomes a programmer problem.

为了获得最有效的扫描,您基本上需要沿着所选键构建最短匹配路径".这就是更改后的结果执行速度要快得多的原因.

You essentially need to construct your "shortest match path" along the selected keys in order to get the most effective scan. That is why the altered results executes much faster.

这篇关于MongoDB:慢查询,即使有索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆