使用正则表达式和排序的Mongodb简单前缀查询很慢 [英] Mongodb simple prefix query with regex and sort is slow

查看:291
本文介绍了使用正则表达式和排序的Mongodb简单前缀查询很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我坚持使用这个简单的前缀查询。尽管 Mongo docs 表明您可以通过使用获得相当不错的性能前缀正则表达式格式(/ ^ a /),当我尝试对结果进行排序时,查询速度很慢:

I'm stuck with this simple prefix query. Although Mongo docs state that you can get pretty good performance by using the prefix regex format (/^a/), the query is pretty slow when I try to sort the results:

940毫米


db.posts.find({hashtags:/ ^ noticias /})。limit(15).sort({rank:-1}) .hint('hashtags_1_rank_-1')。explain()

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()



{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 142691,
"nscanned" : 142692,
"nscannedObjectsAllPlans" : 142691,
"nscannedAllPlans" : 142692,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 934,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"
}

然而,相同查询的未分类版本超快:

However, the unsorted version of the same query is super fast:

0 millis


db.posts.find({hashtags:/ ^ noticias /})。limit(15).hint('hashtags_1_rank_-1')。explain()

db.posts.find({hashtags: /^noticias/ }).limit(15).hint('hashtags_1_rank_-1').explain()



{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

如果我删除正则表达式并排序,查询也很快:

The query is also fast if I remove the regex and sort:

0 millis


db.posts.find({hashtags:'noticias'})。limit(15 ).sort({rank:-1})。提示('hashtags_1_rank_-1')。explain()

db.posts.find({hashtags: 'noticias' }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()



{
"cursor" : "BtreeCursor hashtags_1_rank_-1",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticias"
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

似乎使用正则表达式和排序使Mongo扫描大量记录。但是,如果我不使用正则表达式,sort只扫描15。这里有什么问题?

It seems like using both regex and sort makes Mongo scan lots of records. However, sort is scanning just 15 if I don't use the regex. What's wrong here?

推荐答案

解释输出中的 scanAndOrder:true 表示查询必须检索文档,然后在返回输出之前在内存中对它们进行排序。这是一项昂贵的操作,会对您的查询性能产生影响。

The scanAndOrder: true in the explain output indicates that the query is having to retrieve the documents and then sort them in memory before the output is returned. This is an expensive operation, and will be having an impact on the performance of your query.

scanAndOrder:true 以及 nscanned 中的差异解释输出中的 n 表示查询未使用最佳指数。在这种情况下,它似乎需要进行收集扫描。您可以通过在 sort 条件中包含索引键来缓解此问题。从我的测试:

The existence of scanAndOrder: true as well as the difference in nscanned an n in the explain output indicates that the query is not using an optimal index. In this case it appears to be needing to do a collection scan. You might be able to alleviate this issue by including the index keys in your sort criteria. From my testing:

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({hashtags:1, rank : -1}).explain()

不需要扫描和订单,并返回 n nscanned 您要查找的记录数。这也意味着对 hashtags 键进行排序,这可能对你有用,也可能没用,但应该提高查询的性能。

Does not require a scan and order, and returns n and nscanned of the number of records you are looking for. This would also mean sorting on the hashtags key, which may or may not be useful to you, but should increase the performance of the query.

这篇关于使用正则表达式和排序的Mongodb简单前缀查询很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆