如何同时使用$ gt和$ lte优化MongoDB查询? [英] How to optimize MongoDB query with both $gt and $lte?

查看:109
本文介绍了如何同时使用$ gt和$ lte优化MongoDB查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询,有点像反向范围查询:

I have the following query that is kind of like a reverse range lookup:

db.ip_ranges.find({ $and: [{ start_ip_num: { $lte: 1204135028 } }, { end_ip_num: { $gt: 1204135028 } }] })

仅使用$ lte标识符运行时,查询立即返回.但是当我在同一查询中同时运行$ gt和$ lte时,它非常慢(以秒为单位).

When run with only the $lte identifier, the query returns right away. But when I run with both the $gt and $lte in the same query, it is extremely slow (in seconds).

start_ip_num和end_ip_num字段均已建立索引.

Both the start_ip_num and end_ip_num fields are indexed.

如何优化此查询?

编辑

在查询上使用explain()函数时,我得到以下信息:

I get the following when I use the explain() function on the query:

{
    "cursor" : "BtreeCursor start_ip_num_1",
    "nscanned" : 452336,
    "nscannedObjects" : 452336,
    "n" : 1,
    "millis" : 2218,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "start_ip_num" : [
            [
                -1.7976931348623157e+308,
                1204135028
            ]
        ]
    }
}

编辑2

添加复合索引后,explain()函数将返回以下内容:

Once I added the compound index, the explain() function returns the following:

{
    "cursor" : "BtreeCursor start_ip_num_1_end_ip_num_1",
    "nscanned" : 431776,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 3433,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "start_ip_num" : [
            [
                -1.7976931348623157e+308,
                1204135028
            ]
        ],
        "end_ip_num" : [
            [
                1204135028,
                1.7976931348623157e+308
            ]
        ]
    }
}

但是,性能仍然很差(以秒为单位).

However, the perf is still poor (in seconds).

推荐答案

因此,在Mongo中不建议使用双范围查询.我假设您有一个包含两个{start_ip_num: 1, end_ip_num: 1}的索引.

So, double range queries are ill-advised in Mongo. I assume you have a single index containing both {start_ip_num: 1, end_ip_num: 1}.

如果这样还不能使您足够接近(通常,如果第一个字段返回的数据足够,它仍然很慢,因为它必须进行大量的B树扫描),您可以采取以下一种技巧使用2D框查询来解决此问题(一次仅适用于两个范围).

If that doesn't get you close enough (often it is still slow if you have enough data returned by the first field, since it has to do a lot of B-tree scanning), there is one trick you can do to combat this using 2D box queries (only works for two ranges at a time).

基本上,您将二维地理索引放在包含数组中两个点的字段上,例如[start_ip,end_ip],并为其提供足够高的最小/最大值,以使其不会达到默认情况下为-180/180.

Basically, you put a 2D geo index on a field containing the two points in an array, like [start_ip, end_ip], and give it a high enough min/max value so that it won't hit the limits which are by default just -180/180.

最后,使用边界查询,范围在框的一个角上从min到$ lte值,在框的另一个角上用gt和max值.请参见 http://www.mongodb.org/display/DOCS/Geospatial+ Indexing#GeospatialIndexing-BoundsQueries 用于语法.

Finally, use a bounds query with the range going from min to the $lte value on one corner of the box, and the gt and the max value on the other corner of the box. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-BoundsQueries for syntax.

它看起来像这样:

db.ip_ranges.find({ip_range:{$within:{$box:[[0, 1204135028], [1204135028, max]]}}});

其中max是您可以拥有的最大IP.

where max is the biggest ip you can have.

已经有一段时间了,因为我已经看过这个了,所以框可能是错的,但是这个概念很合理,这使得双范围查询的性能比常规的两字段B树索引更好.持续不到一秒钟(虽然通常是几百毫秒),而使用常规索引则要花费几秒钟-我认为我当时有数亿文档,但是已经有一段时间了,因此请认真地使用这些记住的基准盐.我敢肯定,结果会因您的数据和范围大小而有很大差异.

It's been a while since I've looked a this, so the box might be wrong, but the concept is sound, and it made the double range queries perform a bit better than with a regular two field B-tree index. Consistently under a second (although usually a few hundred ms), compared to a few seconds with the regular index--I think I had hundreds of millions of docs at the time, but it's been a while so take these remembered benchmarks with a grain of salt. Results will vary greatly depending on your data and range sizes, I'm sure.

更新:您可能想尝试使用bits设置,尝试使用低数字和高数字以查看是否有所不同.对我来说,它似乎并没有平均影响查询.参见 http://www.mongodb.org/display/DOCS/Geospatial+ Indexing#GeospatialIndexing-CreatingtheIndex 获取语法.

Update: You might want to experiment with the bits setting, trying a low number and a high number to see if it makes a difference. For me, it didn't seem to affect the queries on average. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-CreatingtheIndex for syntax.

这篇关于如何同时使用$ gt和$ lte优化MongoDB查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆